{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "89b89f64d8f8053d",
   "metadata": {
    "collapsed": false,
    "id": "89b89f64d8f8053d",
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "# 单卡GPU 进行 ChatGLM3-6B模型 LORA 高效微调\n",
    "本 Cookbook 将带领开发者使用 `AdvertiseGen` 对 ChatGLM3-6B 数据集进行 lora微调，使其具备专业的广告生成能力。\n",
    "\n",
    "## 硬件需求\n",
    "显存：24GB及以上（推荐使用30系或A10等sm80架构以上的NVIDIA显卡进行尝试）\n",
    "内存：16GB\n",
    "RAM: 2.9 /16 GB\n",
    "GPU RAM: 15.5/16.0 GB"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7bd9a514ed09ea6",
   "metadata": {
    "collapsed": false,
    "id": "a7bd9a514ed09ea6",
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 0. 环境检查\n",
    "首先，先检查代码的运行地址，确保运行地址处于 `finetune_demo` 中。\n",
    "并且，确保已经安装了 `requirements.txt`中的依赖。\n",
    "\n",
    "> 本 demo 中，不需要使用 deepspeed, mpi4py 两个依赖，如果您安装这两个依赖遇到问题，可以不安装这两个依赖。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f7703109d1443346",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-14T05:29:22.200365Z",
     "start_time": "2024-04-14T05:29:22.080929Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/ubuntu/workspace/ChatGLM3/finetune_demo\n"
     ]
    }
   ],
   "source": [
    "!pwd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f50e92810011977",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 1. 准备数据集\n",
    "我们使用 AdvertiseGen 数据集来进行微调。从 [Google Drive](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view?usp=sharing) 或者 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1) 下载处理好的 AdvertiseGen 数据集，将解压后的 AdvertiseGen 目录放到本目录的 `/data/` 下, 例如。\n",
    "> /media/zr/Data/Code/ChatGLM3/finetune_demo/data/AdvertiseGen"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "initial_id",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-14T05:29:23.809255Z",
     "start_time": "2024-04-14T05:29:22.202731Z"
    },
    "cellView": "form",
    "id": "initial_id"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "from typing import Union\n",
    "from pathlib import Path\n",
    "\n",
    "\n",
    "def _resolve_path(path: Union[str, Path]) -> Path:\n",
    "    return Path(path).expanduser().resolve()\n",
    "\n",
    "\n",
    "def _mkdir(dir_name: Union[str, Path]):\n",
    "    dir_name = _resolve_path(dir_name)\n",
    "    if not dir_name.is_dir():\n",
    "        dir_name.mkdir(parents=True, exist_ok=False)\n",
    "\n",
    "\n",
    "def convert_adgen(data_dir: Union[str, Path], save_dir: Union[str, Path]):\n",
    "    def _convert(in_file: Path, out_file: Path):\n",
    "        _mkdir(out_file.parent)\n",
    "        with open(in_file, encoding='utf-8') as fin:\n",
    "            with open(out_file, 'wt', encoding='utf-8') as fout:\n",
    "                for line in fin:\n",
    "                    dct = json.loads(line)\n",
    "                    sample = {'conversations': [{'role': 'user', 'content': dct['content']},\n",
    "                                                {'role': 'assistant', 'content': dct['summary']}]}\n",
    "                    fout.write(json.dumps(sample, ensure_ascii=False) + '\\n')\n",
    "\n",
    "    data_dir = _resolve_path(data_dir)\n",
    "    save_dir = _resolve_path(save_dir)\n",
    "\n",
    "    train_file = data_dir / 'train.json'\n",
    "    if train_file.is_file():\n",
    "        out_file = save_dir / train_file.relative_to(data_dir)\n",
    "        _convert(train_file, out_file)\n",
    "\n",
    "    dev_file = data_dir / 'dev.json'\n",
    "    if dev_file.is_file():\n",
    "        out_file = save_dir / dev_file.relative_to(data_dir)\n",
    "        _convert(dev_file, out_file)\n",
    "\n",
    "\n",
    "convert_adgen('data/AdvertiseGen', 'data/AdvertiseGen_fix')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b7a99923349056",
   "metadata": {
    "collapsed": false,
    "id": "a1b7a99923349056",
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 2. 使用命令行开始微调,我们使用 lora 进行微调\n",
    "接着，我们仅需要将配置好的参数以命令行的形式传参给程序，就可以使用命令行进行高效微调。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "17c87410a24d844f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-14T06:23:41.282431Z",
     "start_time": "2024-04-14T05:29:23.810692Z"
    },
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "17c87410a24d844f",
    "outputId": "e347fc7d-875e-40c9-c682-3e064100476b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "Setting eos_token is not supported, use the default one.\n",
      "Setting pad_token is not supported, use the default one.\n",
      "Setting unk_token is not supported, use the default one.\n",
      "Downloading shards:   0%|                                 | 0/7 [00:00<?, ?it/s]\n",
      "model-00002-of-00007.safetensors:   0%|             | 0.00/1.97G [00:00<?, ?B/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   1%|    | 10.5M/1.97G [00:02<07:31, 4.33MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   1%|    | 21.0M/1.97G [00:04<07:20, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   2%|    | 31.5M/1.97G [00:07<07:14, 4.46MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   2%|    | 41.9M/1.97G [00:09<07:14, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   3%|    | 52.4M/1.97G [00:12<07:25, 4.30MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   3%|▏   | 62.9M/1.97G [00:14<07:02, 4.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   4%|▏   | 73.4M/1.97G [00:16<07:03, 4.47MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   4%|▏   | 83.9M/1.97G [00:18<07:00, 4.48MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   5%|▏   | 94.4M/1.97G [00:24<09:45, 3.20MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   5%|▎    | 105M/1.97G [00:26<08:50, 3.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   6%|▎    | 115M/1.97G [00:28<08:13, 3.76MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   6%|▎    | 126M/1.97G [00:31<07:48, 3.93MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   7%|▎    | 136M/1.97G [00:33<07:27, 4.10MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   7%|▎    | 147M/1.97G [00:35<07:14, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   8%|▍    | 157M/1.97G [00:38<07:05, 4.26MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   9%|▍    | 168M/1.97G [00:40<06:57, 4.31MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:   9%|▍    | 178M/1.97G [00:43<06:52, 4.34MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  10%|▍    | 189M/1.97G [00:45<06:45, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  10%|▌    | 199M/1.97G [00:47<06:39, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  11%|▌    | 210M/1.97G [00:50<06:36, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  11%|▌    | 220M/1.97G [00:52<06:34, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  12%|▌    | 231M/1.97G [00:54<06:31, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  12%|▌    | 241M/1.97G [01:00<08:55, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  13%|▋    | 252M/1.97G [01:02<08:09, 3.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  13%|▋    | 262M/1.97G [01:04<07:35, 3.75MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  14%|▋    | 273M/1.97G [01:07<07:09, 3.95MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  14%|▋    | 283M/1.97G [01:09<06:54, 4.06MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  15%|▋    | 294M/1.97G [01:11<06:41, 4.17MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  15%|▊    | 304M/1.97G [01:14<06:27, 4.29MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  16%|▊    | 315M/1.97G [01:16<06:22, 4.32MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  17%|▊    | 325M/1.97G [01:18<06:14, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  17%|▊    | 336M/1.97G [01:21<06:14, 4.36MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  18%|▉    | 346M/1.97G [01:23<06:06, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  18%|▉    | 357M/1.97G [01:25<06:03, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  19%|▉    | 367M/1.97G [01:28<06:00, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  19%|▉    | 377M/1.97G [01:30<05:59, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  20%|▉    | 388M/1.97G [01:35<08:08, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  20%|█    | 398M/1.97G [01:38<07:25, 3.52MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  21%|█    | 409M/1.97G [01:40<06:57, 3.74MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  21%|█    | 419M/1.97G [01:43<06:32, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  22%|█    | 430M/1.97G [01:45<06:16, 4.08MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  22%|█    | 440M/1.97G [01:47<06:03, 4.20MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  23%|█▏   | 451M/1.97G [01:50<05:55, 4.27MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  23%|█▏   | 461M/1.97G [01:52<05:49, 4.31MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  24%|█▏   | 472M/1.97G [01:54<05:44, 4.34MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  25%|█▏   | 482M/1.97G [01:57<05:37, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  25%|█▎   | 493M/1.97G [01:59<05:34, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  26%|█▎   | 503M/1.97G [02:01<05:31, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  26%|█▎   | 514M/1.97G [02:04<05:27, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  27%|█▎   | 524M/1.97G [02:06<05:28, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  27%|█▎   | 535M/1.97G [02:08<05:22, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  28%|█▍   | 545M/1.97G [02:11<05:19, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  28%|█▍   | 556M/1.97G [02:13<05:16, 4.47MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  29%|█▍   | 566M/1.97G [02:16<05:14, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  29%|█▍   | 577M/1.97G [02:18<05:11, 4.47MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  30%|█▍   | 587M/1.97G [02:20<05:10, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  30%|█▌   | 598M/1.97G [02:27<07:58, 2.86MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  31%|█▌   | 608M/1.97G [02:29<07:03, 3.21MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  31%|█▌   | 619M/1.97G [02:32<06:25, 3.50MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  32%|█▌   | 629M/1.97G [02:34<05:57, 3.74MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  32%|█▌   | 640M/1.97G [02:36<05:38, 3.92MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  33%|█▋   | 650M/1.97G [02:39<05:22, 4.09MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  34%|█▋   | 661M/1.97G [02:41<05:11, 4.20MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  34%|█▋   | 671M/1.97G [02:43<05:04, 4.27MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  35%|█▋   | 682M/1.97G [02:46<04:58, 4.32MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  35%|█▊   | 692M/1.97G [02:48<04:53, 4.35MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  36%|█▊   | 703M/1.97G [02:50<04:49, 4.38MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  36%|█▊   | 713M/1.97G [02:53<04:44, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  37%|█▊   | 724M/1.97G [02:55<04:41, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  37%|█▊   | 734M/1.97G [02:58<04:38, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  38%|█▉   | 744M/1.97G [03:03<06:19, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  38%|█▉   | 755M/1.97G [03:05<05:44, 3.52MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  39%|█▉   | 765M/1.97G [03:08<05:19, 3.76MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  39%|█▉   | 776M/1.97G [03:10<05:02, 3.95MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  40%|█▉   | 786M/1.97G [03:12<04:49, 4.09MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  40%|██   | 797M/1.97G [03:15<04:39, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  41%|██   | 807M/1.97G [03:17<04:32, 4.25MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  42%|██   | 818M/1.97G [03:19<04:25, 4.33MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  42%|██   | 828M/1.97G [03:22<04:22, 4.34MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  43%|██▏  | 839M/1.97G [03:24<04:17, 4.38MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  43%|██▏  | 849M/1.97G [03:26<04:14, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  44%|██▏  | 860M/1.97G [03:29<04:10, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  44%|██▏  | 870M/1.97G [03:31<04:06, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  45%|██▏  | 881M/1.97G [03:33<04:04, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  45%|██▎  | 891M/1.97G [03:39<05:33, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  46%|██▎  | 902M/1.97G [03:41<05:03, 3.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  46%|██▎  | 912M/1.97G [03:43<04:41, 3.75MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  47%|██▎  | 923M/1.97G [03:46<04:25, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  47%|██▎  | 933M/1.97G [03:48<04:13, 4.08MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  48%|██▍  | 944M/1.97G [03:51<04:04, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  48%|██▍  | 954M/1.97G [03:53<03:57, 4.26MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  49%|██▍  | 965M/1.97G [03:55<03:52, 4.32MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  50%|██▍  | 975M/1.97G [03:58<03:47, 4.37MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  50%|██▌  | 986M/1.97G [04:00<03:43, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  51%|██▌  | 996M/1.97G [04:02<03:40, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  51%|██  | 1.01G/1.97G [04:05<03:37, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  52%|██  | 1.02G/1.97G [04:07<03:34, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  52%|██  | 1.03G/1.97G [04:12<04:51, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  53%|██  | 1.04G/1.97G [04:15<04:23, 3.52MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  53%|██▏ | 1.05G/1.97G [04:17<04:04, 3.76MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  54%|██▏ | 1.06G/1.97G [04:19<03:50, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  54%|██▏ | 1.07G/1.97G [04:22<03:40, 4.08MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  55%|██▏ | 1.08G/1.97G [04:24<03:32, 4.18MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  55%|██▏ | 1.09G/1.97G [04:26<03:25, 4.27MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  56%|██▏ | 1.10G/1.97G [04:29<03:21, 4.31MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  56%|██▎ | 1.11G/1.97G [04:31<03:16, 4.35MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  57%|██▎ | 1.12G/1.97G [04:33<03:12, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  58%|██▎ | 1.13G/1.97G [04:36<03:10, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  58%|██▎ | 1.14G/1.97G [04:38<03:06, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  59%|██▎ | 1.15G/1.97G [04:41<03:03, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  59%|██▎ | 1.16G/1.97G [04:43<03:01, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  60%|██▍ | 1.17G/1.97G [04:48<04:05, 3.24MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  60%|██▍ | 1.18G/1.97G [04:51<03:43, 3.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  61%|██▍ | 1.20G/1.97G [04:53<03:26, 3.75MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  61%|██▍ | 1.21G/1.97G [04:55<03:13, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  62%|██▍ | 1.22G/1.97G [04:58<03:05, 4.06MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  62%|██▍ | 1.23G/1.97G [05:00<02:56, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  63%|██▌ | 1.24G/1.97G [05:02<02:51, 4.25MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  63%|██▌ | 1.25G/1.97G [05:05<02:46, 4.33MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  64%|██▌ | 1.26G/1.97G [05:07<02:42, 4.36MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  64%|██▌ | 1.27G/1.97G [05:09<02:39, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  65%|██▌ | 1.28G/1.97G [05:12<02:36, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  66%|██▌ | 1.29G/1.97G [05:14<02:33, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  66%|██▋ | 1.30G/1.97G [05:16<02:30, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  67%|██▋ | 1.31G/1.97G [05:19<02:27, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  67%|██▋ | 1.32G/1.97G [05:21<02:25, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  68%|██▋ | 1.33G/1.97G [05:24<02:23, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  68%|██▋ | 1.34G/1.97G [05:26<02:20, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  69%|██▋ | 1.35G/1.97G [05:28<02:18, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  69%|██▊ | 1.36G/1.97G [05:31<02:15, 4.46MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  70%|██▊ | 1.37G/1.97G [05:33<02:13, 4.45MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  70%|██▊ | 1.38G/1.97G [05:40<03:24, 2.86MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  71%|██▊ | 1.39G/1.97G [05:42<02:58, 3.21MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  71%|██▊ | 1.41G/1.97G [05:44<02:41, 3.50MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  72%|██▉ | 1.42G/1.97G [05:47<02:27, 3.75MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  72%|██▉ | 1.43G/1.97G [05:49<02:17, 3.93MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  73%|██▉ | 1.44G/1.97G [05:51<02:10, 4.07MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  74%|██▉ | 1.45G/1.97G [05:54<02:04, 4.17MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  74%|██▉ | 1.46G/1.97G [05:56<01:59, 4.27MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  75%|██▉ | 1.47G/1.97G [05:58<01:55, 4.32MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  75%|███ | 1.48G/1.97G [06:01<01:52, 4.36MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  76%|███ | 1.49G/1.97G [06:03<01:49, 4.39MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  76%|███ | 1.50G/1.97G [06:06<01:46, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  77%|███ | 1.51G/1.97G [06:08<01:43, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  77%|███ | 1.52G/1.97G [06:10<01:41, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  78%|███ | 1.53G/1.97G [06:16<02:15, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  78%|███▏| 1.54G/1.97G [06:18<02:01, 3.53MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  79%|███▏| 1.55G/1.97G [06:20<01:51, 3.75MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  79%|███▏| 1.56G/1.97G [06:23<01:43, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  80%|███▏| 1.57G/1.97G [06:25<01:36, 4.09MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  80%|███▏| 1.58G/1.97G [06:27<01:31, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  81%|███▏| 1.59G/1.97G [06:30<01:27, 4.26MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  82%|███▎| 1.60G/1.97G [06:32<01:26, 4.22MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  82%|███▎| 1.61G/1.97G [06:34<01:20, 4.38MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  83%|███▎| 1.63G/1.97G [06:37<01:17, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  83%|███▎| 1.64G/1.97G [06:39<01:15, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  84%|███▎| 1.65G/1.97G [06:41<01:13, 4.41MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  84%|███▎| 1.66G/1.97G [06:44<01:10, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  85%|███▍| 1.67G/1.97G [06:46<01:07, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  85%|███▍| 1.68G/1.97G [06:51<01:29, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  86%|███▍| 1.69G/1.97G [06:54<01:19, 3.52MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  86%|███▍| 1.70G/1.97G [06:56<01:11, 3.76MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  87%|███▍| 1.71G/1.97G [06:59<01:05, 3.95MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  87%|███▍| 1.72G/1.97G [07:01<01:00, 4.08MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  88%|███▌| 1.73G/1.97G [07:03<00:56, 4.20MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  88%|███▌| 1.74G/1.97G [07:06<00:53, 4.27MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  89%|███▌| 1.75G/1.97G [07:08<00:50, 4.28MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  89%|███▌| 1.76G/1.97G [07:10<00:47, 4.36MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  90%|███▌| 1.77G/1.97G [07:13<00:44, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  91%|███▌| 1.78G/1.97G [07:15<00:42, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  91%|███▋| 1.79G/1.97G [07:17<00:39, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  92%|███▋| 1.80G/1.97G [07:20<00:37, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  92%|███▋| 1.81G/1.97G [07:22<00:34, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  93%|███▋| 1.82G/1.97G [07:27<00:44, 3.23MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  93%|███▋| 1.84G/1.97G [07:30<00:37, 3.51MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  94%|███▊| 1.85G/1.97G [07:32<00:32, 3.76MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  94%|███▊| 1.86G/1.97G [07:34<00:28, 3.94MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  95%|███▊| 1.87G/1.97G [07:37<00:24, 4.08MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  95%|███▊| 1.88G/1.97G [07:39<00:21, 4.19MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  96%|███▊| 1.89G/1.97G [07:42<00:18, 4.26MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  96%|███▊| 1.90G/1.97G [07:44<00:16, 4.32MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  97%|███▉| 1.91G/1.97G [07:46<00:13, 4.35MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  97%|███▉| 1.92G/1.97G [07:49<00:11, 4.38MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  98%|███▉| 1.93G/1.97G [07:51<00:08, 4.40MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  99%|███▉| 1.94G/1.97G [07:53<00:06, 4.42MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors:  99%|███▉| 1.95G/1.97G [07:56<00:04, 4.43MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors: 100%|███▉| 1.96G/1.97G [07:58<00:01, 4.44MB/s]\u001b[A\n",
      "model-00002-of-00007.safetensors: 100%|████| 1.97G/1.97G [08:03<00:00, 4.07MB/s]\u001b[A\n",
      "Downloading shards:  29%|██████▊                 | 2/7 [08:04<20:10, 242.12s/it]\n",
      "model-00003-of-00007.safetensors:   0%|             | 0.00/1.93G [00:00<?, ?B/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   1%|    | 10.5M/1.93G [00:01<04:57, 6.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   1%|    | 21.0M/1.93G [00:03<05:57, 5.33MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   2%|    | 31.5M/1.93G [00:06<06:27, 4.89MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   2%|    | 41.9M/1.93G [00:08<06:38, 4.73MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   3%|    | 52.4M/1.93G [00:10<06:48, 4.59MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   3%|▏   | 62.9M/1.93G [00:13<06:47, 4.57MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   4%|▏   | 73.4M/1.93G [00:15<06:49, 4.53MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   4%|▏   | 83.9M/1.93G [00:17<06:49, 4.50MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   5%|▏   | 94.4M/1.93G [00:20<06:48, 4.48MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   5%|▎    | 105M/1.93G [00:22<06:47, 4.48MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   6%|▎    | 115M/1.93G [00:25<06:45, 4.47MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   7%|▎    | 126M/1.93G [00:27<06:43, 4.46MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   7%|▎    | 136M/1.93G [00:29<06:41, 4.46MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   8%|▍    | 147M/1.93G [00:35<09:09, 3.24MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   8%|▍    | 157M/1.93G [00:37<08:22, 3.53MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   9%|▍    | 168M/1.93G [00:39<07:47, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:   9%|▍    | 178M/1.93G [00:42<07:23, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  10%|▍    | 189M/1.93G [00:44<07:06, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  10%|▌    | 199M/1.93G [00:46<06:52, 4.18MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  11%|▌    | 210M/1.93G [00:49<06:42, 4.27MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  11%|▌    | 220M/1.93G [00:51<06:35, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  12%|▌    | 231M/1.93G [00:53<06:29, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  13%|▋    | 241M/1.93G [00:56<06:25, 4.37MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  13%|▋    | 252M/1.93G [00:58<06:19, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  14%|▋    | 262M/1.93G [01:00<06:16, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  14%|▋    | 273M/1.93G [01:03<06:13, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  15%|▋    | 283M/1.93G [01:08<08:30, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  15%|▊    | 294M/1.93G [01:10<07:45, 3.51MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  16%|▊    | 304M/1.93G [01:13<07:12, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  16%|▊    | 315M/1.93G [01:15<06:49, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  17%|▊    | 325M/1.93G [01:18<06:32, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  17%|▊    | 336M/1.93G [01:20<06:20, 4.18MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  18%|▉    | 346M/1.93G [01:22<06:11, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  18%|▉    | 357M/1.93G [01:25<06:05, 4.30MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  19%|▉    | 367M/1.93G [01:27<05:58, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  20%|▉    | 377M/1.93G [01:29<05:53, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  20%|█    | 388M/1.93G [01:32<05:49, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  21%|█    | 398M/1.93G [01:34<05:45, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  21%|█    | 409M/1.93G [01:36<05:42, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  22%|█    | 419M/1.93G [01:39<05:39, 4.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  22%|█    | 430M/1.93G [01:44<07:44, 3.23MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  23%|█▏   | 440M/1.93G [01:46<07:02, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  23%|█▏   | 451M/1.93G [01:49<06:33, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  24%|█▏   | 461M/1.93G [01:51<06:13, 3.93MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  24%|█▏   | 472M/1.93G [01:54<05:58, 4.06MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  25%|█▎   | 482M/1.93G [01:56<05:44, 4.20MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  26%|█▎   | 493M/1.93G [01:58<05:36, 4.27MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  26%|█▎   | 503M/1.93G [02:01<05:29, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  27%|█▎   | 514M/1.93G [02:03<05:25, 4.34MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  27%|█▎   | 524M/1.93G [02:05<05:21, 4.37MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  28%|█▍   | 535M/1.93G [02:08<05:16, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  28%|█▍   | 545M/1.93G [02:10<05:12, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  29%|█▍   | 556M/1.93G [02:12<05:09, 4.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  29%|█▍   | 566M/1.93G [02:15<05:06, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  30%|█▍   | 577M/1.93G [02:20<06:58, 3.23MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  30%|█▌   | 587M/1.93G [02:22<06:20, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  31%|█▌   | 598M/1.93G [02:25<05:53, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  32%|█▌   | 608M/1.93G [02:27<05:35, 3.93MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  32%|█▌   | 619M/1.93G [02:29<05:20, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  33%|█▋   | 629M/1.93G [02:32<05:09, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  33%|█▋   | 640M/1.93G [02:34<05:02, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  34%|█▋   | 650M/1.93G [02:37<04:56, 4.31MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  34%|█▋   | 661M/1.93G [02:39<04:50, 4.36MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  35%|█▋   | 671M/1.93G [02:41<04:47, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  35%|█▊   | 682M/1.93G [02:44<04:42, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  36%|█▊   | 692M/1.93G [02:46<04:39, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  36%|█▊   | 703M/1.93G [02:48<04:36, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  37%|█▊   | 713M/1.93G [02:51<04:33, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  38%|█▉   | 724M/1.93G [02:56<06:13, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  38%|█▉   | 734M/1.93G [02:58<05:39, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  39%|█▉   | 744M/1.93G [03:01<05:15, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  39%|█▉   | 755M/1.93G [03:03<04:57, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  40%|█▉   | 765M/1.93G [03:05<04:44, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  40%|██   | 776M/1.93G [03:08<04:35, 4.18MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  41%|██   | 786M/1.93G [03:10<04:27, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  41%|██   | 797M/1.93G [03:12<04:22, 4.31MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  42%|██   | 807M/1.93G [03:15<04:17, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  42%|██   | 818M/1.93G [03:17<04:13, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  43%|██▏  | 828M/1.93G [03:20<04:09, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  44%|██▏  | 839M/1.93G [03:22<04:06, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  44%|██▏  | 849M/1.93G [03:24<04:03, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  45%|██▏  | 860M/1.93G [03:27<04:00, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  45%|██▎  | 870M/1.93G [03:32<05:26, 3.23MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  46%|██▎  | 881M/1.93G [03:34<04:56, 3.53MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  46%|██▎  | 891M/1.93G [03:37<04:35, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  47%|██▎  | 902M/1.93G [03:39<04:20, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  47%|██▎  | 912M/1.93G [03:41<04:08, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  48%|██▍  | 923M/1.93G [03:44<03:59, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  48%|██▍  | 933M/1.93G [03:46<03:53, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  49%|██▍  | 944M/1.93G [03:48<03:47, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  50%|██▍  | 954M/1.93G [03:51<03:43, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  50%|██▌  | 965M/1.93G [03:53<03:39, 4.39MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  51%|██▌  | 975M/1.93G [03:55<03:36, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  51%|██▌  | 986M/1.93G [03:58<03:32, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  52%|██▌  | 996M/1.93G [04:00<03:30, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  52%|██  | 1.01G/1.93G [04:05<04:45, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  53%|██  | 1.02G/1.93G [04:08<04:18, 3.51MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  53%|██▏ | 1.03G/1.93G [04:10<03:59, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  54%|██▏ | 1.04G/1.93G [04:13<03:45, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  54%|██▏ | 1.05G/1.93G [04:15<03:35, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  55%|██▏ | 1.06G/1.93G [04:17<03:27, 4.18MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  55%|██▏ | 1.07G/1.93G [04:20<03:21, 4.27MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  56%|██▏ | 1.08G/1.93G [04:22<03:16, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  57%|██▎ | 1.09G/1.93G [04:24<03:12, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  57%|██▎ | 1.10G/1.93G [04:27<03:08, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  58%|██▎ | 1.11G/1.93G [04:29<03:05, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  58%|██▎ | 1.12G/1.93G [04:31<03:02, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  59%|██▎ | 1.13G/1.93G [04:34<02:59, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  59%|██▎ | 1.14G/1.93G [04:36<02:57, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  60%|██▍ | 1.15G/1.93G [04:41<04:00, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  60%|██▍ | 1.16G/1.93G [04:44<03:37, 3.51MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  61%|██▍ | 1.17G/1.93G [04:46<03:20, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  61%|██▍ | 1.18G/1.93G [04:48<03:08, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  62%|██▍ | 1.20G/1.93G [04:51<02:59, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  63%|██▌ | 1.21G/1.93G [04:53<02:52, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  63%|██▌ | 1.22G/1.93G [04:56<02:47, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  64%|██▌ | 1.23G/1.93G [04:58<02:42, 4.30MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  64%|██▌ | 1.24G/1.93G [05:00<02:38, 4.35MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  65%|██▌ | 1.25G/1.93G [05:03<02:35, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  65%|██▌ | 1.26G/1.93G [05:05<02:31, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  66%|██▋ | 1.27G/1.93G [05:07<02:29, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  66%|██▋ | 1.28G/1.93G [05:10<02:26, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  67%|██▋ | 1.29G/1.93G [05:12<02:23, 4.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  67%|██▋ | 1.30G/1.93G [05:17<03:14, 3.23MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  68%|██▋ | 1.31G/1.93G [05:20<02:54, 3.53MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  69%|██▋ | 1.32G/1.93G [05:22<02:41, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  69%|██▊ | 1.33G/1.93G [05:24<02:31, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  70%|██▊ | 1.34G/1.93G [05:27<02:23, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  70%|██▊ | 1.35G/1.93G [05:29<02:17, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  71%|██▊ | 1.36G/1.93G [05:31<02:12, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  71%|██▊ | 1.37G/1.93G [05:34<02:08, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  72%|██▊ | 1.38G/1.93G [05:36<02:04, 4.36MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  72%|██▉ | 1.39G/1.93G [05:39<02:01, 4.39MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  73%|██▉ | 1.41G/1.93G [05:41<01:58, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  73%|██▉ | 1.42G/1.93G [05:43<01:55, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  74%|██▉ | 1.43G/1.93G [05:46<01:53, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  75%|██▉ | 1.44G/1.93G [05:48<01:50, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  75%|███ | 1.45G/1.93G [05:53<02:29, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  76%|███ | 1.46G/1.93G [05:56<02:13, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  76%|███ | 1.47G/1.93G [05:58<02:02, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  77%|███ | 1.48G/1.93G [06:00<01:54, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  77%|███ | 1.49G/1.93G [06:03<01:47, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  78%|███ | 1.50G/1.93G [06:05<01:42, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  78%|███▏| 1.51G/1.93G [06:07<01:37, 4.26MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  79%|███▏| 1.52G/1.93G [06:10<01:34, 4.31MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  79%|███▏| 1.53G/1.93G [06:12<01:31, 4.34MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  80%|███▏| 1.54G/1.93G [06:14<01:28, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  81%|███▏| 1.55G/1.93G [06:17<01:25, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  81%|███▏| 1.56G/1.93G [06:19<01:22, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  82%|███▎| 1.57G/1.93G [06:22<01:20, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  82%|███▎| 1.58G/1.93G [06:24<01:17, 4.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  83%|███▎| 1.59G/1.93G [06:29<01:43, 3.22MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  83%|███▎| 1.60G/1.93G [06:32<01:31, 3.51MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  84%|███▎| 1.61G/1.93G [06:34<01:24, 3.71MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  84%|███▎| 1.63G/1.93G [06:36<01:16, 3.96MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  85%|███▍| 1.64G/1.93G [06:39<01:11, 4.09MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  85%|███▍| 1.65G/1.93G [06:41<01:07, 4.19MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  86%|███▍| 1.66G/1.93G [06:43<01:03, 4.27MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  87%|███▍| 1.67G/1.93G [06:46<01:00, 4.32MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  87%|███▍| 1.68G/1.93G [06:48<00:57, 4.37MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  88%|███▌| 1.69G/1.93G [06:50<00:54, 4.38MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  88%|███▌| 1.70G/1.93G [06:53<00:51, 4.41MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  89%|███▌| 1.71G/1.93G [06:55<00:49, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  89%|███▌| 1.72G/1.93G [06:57<00:46, 4.44MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  90%|███▌| 1.73G/1.93G [07:03<01:00, 3.24MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  90%|███▌| 1.74G/1.93G [07:05<00:53, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  91%|███▋| 1.75G/1.93G [07:08<00:46, 3.76MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  91%|███▋| 1.76G/1.93G [07:10<00:42, 3.94MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  92%|███▋| 1.77G/1.93G [07:12<00:38, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  92%|███▋| 1.78G/1.93G [07:15<00:35, 4.09MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  93%|███▋| 1.79G/1.93G [07:17<00:31, 4.29MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  94%|███▋| 1.80G/1.93G [07:19<00:28, 4.33MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  94%|███▊| 1.81G/1.93G [07:22<00:25, 4.37MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  95%|███▊| 1.82G/1.93G [07:24<00:23, 4.40MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  95%|███▊| 1.84G/1.93G [07:26<00:20, 4.42MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  96%|███▊| 1.85G/1.93G [07:29<00:18, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  96%|███▊| 1.86G/1.93G [07:31<00:16, 4.43MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  97%|███▊| 1.87G/1.93G [07:33<00:13, 4.45MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  97%|███▉| 1.88G/1.93G [07:39<00:15, 3.23MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  98%|███▉| 1.89G/1.93G [07:41<00:11, 3.52MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  98%|███▉| 1.90G/1.93G [07:43<00:07, 3.75MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors:  99%|███▉| 1.91G/1.93G [07:46<00:04, 3.93MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors: 100%|███▉| 1.92G/1.93G [07:48<00:02, 4.08MB/s]\u001b[A\n",
      "model-00003-of-00007.safetensors: 100%|████| 1.93G/1.93G [07:50<00:00, 4.10MB/s]\u001b[A\n",
      "Downloading shards:  43%|██████████▎             | 3/7 [15:55<22:31, 337.77s/it]\n",
      "model-00004-of-00007.safetensors:   0%|             | 0.00/1.82G [00:00<?, ?B/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   1%|    | 10.5M/1.82G [00:01<04:22, 6.88MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   1%|    | 21.0M/1.82G [00:03<05:36, 5.33MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   2%|    | 31.5M/1.82G [00:06<06:04, 4.89MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   2%|    | 41.9M/1.82G [00:08<06:17, 4.70MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   3%|    | 52.4M/1.82G [00:10<06:22, 4.61MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   3%|▏   | 62.9M/1.82G [00:13<06:24, 4.56MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   4%|▏   | 73.4M/1.82G [00:15<06:25, 4.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   5%|▏   | 83.9M/1.82G [00:17<06:24, 4.50MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   5%|▏   | 94.4M/1.82G [00:23<08:55, 3.21MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   6%|▎    | 105M/1.82G [00:25<08:06, 3.51MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   6%|▎    | 115M/1.82G [00:27<07:33, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   7%|▎    | 126M/1.82G [00:30<07:08, 3.95MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   8%|▍    | 136M/1.82G [00:32<06:50, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   8%|▍    | 147M/1.82G [00:35<06:38, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   9%|▍    | 157M/1.82G [00:37<06:28, 4.27MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:   9%|▍    | 168M/1.82G [00:39<06:21, 4.32MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  10%|▍    | 178M/1.82G [00:42<06:15, 4.36MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  10%|▌    | 189M/1.82G [00:44<06:10, 4.39MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  11%|▌    | 199M/1.82G [00:46<06:07, 4.40MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  12%|▌    | 210M/1.82G [00:49<06:03, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  12%|▌    | 220M/1.82G [00:51<06:01, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  13%|▋    | 231M/1.82G [00:54<06:05, 4.34MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  13%|▋    | 241M/1.82G [00:59<08:05, 3.24MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  14%|▋    | 252M/1.82G [01:01<07:23, 3.53MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  14%|▋    | 262M/1.82G [01:03<06:52, 3.76MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  15%|▊    | 273M/1.82G [01:06<06:30, 3.95MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  16%|▊    | 283M/1.82G [01:08<06:15, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  16%|▊    | 294M/1.82G [01:10<06:03, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  17%|▊    | 304M/1.82G [01:13<05:55, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  17%|▊    | 315M/1.82G [01:15<05:47, 4.32MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  18%|▉    | 325M/1.82G [01:18<05:42, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  18%|▉    | 336M/1.82G [01:20<05:37, 4.39MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  19%|▉    | 346M/1.82G [01:22<05:33, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  20%|▉    | 357M/1.82G [01:25<05:29, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  20%|█    | 367M/1.82G [01:27<05:27, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  21%|█    | 377M/1.82G [01:29<05:24, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  21%|█    | 388M/1.82G [01:35<07:22, 3.22MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  22%|█    | 398M/1.82G [01:37<06:42, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  23%|█▏   | 409M/1.82G [01:39<06:14, 3.76MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  23%|█▏   | 419M/1.82G [01:42<05:54, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  24%|█▏   | 430M/1.82G [01:44<05:39, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  24%|█▏   | 440M/1.82G [01:46<05:28, 4.18MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  25%|█▏   | 451M/1.82G [01:49<05:20, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  25%|█▎   | 461M/1.82G [01:51<05:13, 4.31MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  26%|█▎   | 472M/1.82G [01:54<05:08, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  27%|█▎   | 482M/1.82G [01:56<05:04, 4.38MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  27%|█▎   | 493M/1.82G [01:58<05:00, 4.40MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  28%|█▍   | 503M/1.82G [02:01<04:57, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  28%|█▍   | 514M/1.82G [02:03<04:54, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  29%|█▍   | 524M/1.82G [02:08<06:40, 3.22MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  29%|█▍   | 535M/1.82G [02:11<06:03, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  30%|█▌   | 545M/1.82G [02:13<05:38, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  31%|█▌   | 556M/1.82G [02:15<05:19, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  31%|█▌   | 566M/1.82G [02:18<05:05, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  32%|█▌   | 577M/1.82G [02:20<04:55, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  32%|█▌   | 587M/1.82G [02:22<04:48, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  33%|█▋   | 598M/1.82G [02:25<04:41, 4.32MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  34%|█▋   | 608M/1.82G [02:27<04:37, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  34%|█▋   | 619M/1.82G [02:29<04:33, 4.38MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  35%|█▋   | 629M/1.82G [02:32<04:33, 4.34MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  35%|█▊   | 640M/1.82G [02:34<04:25, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  36%|█▊   | 650M/1.82G [02:37<04:22, 4.44MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  36%|█▊   | 661M/1.82G [02:39<04:20, 4.44MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  37%|█▊   | 671M/1.82G [02:44<05:53, 3.23MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  38%|█▉   | 682M/1.82G [02:47<05:22, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  38%|█▉   | 692M/1.82G [02:49<04:58, 3.76MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  39%|█▉   | 703M/1.82G [02:51<04:42, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  39%|█▉   | 713M/1.82G [02:54<04:29, 4.09MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  40%|█▉   | 724M/1.82G [02:56<04:20, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  40%|██   | 734M/1.82G [02:58<04:13, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  41%|██   | 744M/1.82G [03:01<04:07, 4.32MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  42%|██   | 755M/1.82G [03:03<04:03, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  42%|██   | 765M/1.82G [03:05<03:59, 4.37MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  43%|██▏  | 776M/1.82G [03:08<03:55, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  43%|██▏  | 786M/1.82G [03:10<03:53, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  44%|██▏  | 797M/1.82G [03:12<03:49, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  44%|██▏  | 807M/1.82G [03:15<03:47, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  45%|██▎  | 818M/1.82G [03:20<05:09, 3.22MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  46%|██▎  | 828M/1.82G [03:23<04:41, 3.51MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  46%|██▎  | 839M/1.82G [03:25<04:20, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  47%|██▎  | 849M/1.82G [03:27<04:05, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  47%|██▎  | 860M/1.82G [03:30<03:54, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  48%|██▍  | 870M/1.82G [03:32<03:45, 4.18MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  49%|██▍  | 881M/1.82G [03:34<03:39, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  49%|██▍  | 891M/1.82G [03:37<03:34, 4.31MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  50%|██▍  | 902M/1.82G [03:39<03:29, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  50%|██▌  | 912M/1.82G [03:41<03:25, 4.38MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  51%|██▌  | 923M/1.82G [03:44<03:22, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  51%|██▌  | 933M/1.82G [03:46<03:19, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  52%|██▌  | 944M/1.82G [03:48<03:17, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  53%|██▋  | 954M/1.82G [03:51<03:14, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  53%|██▋  | 965M/1.82G [03:56<04:24, 3.22MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  54%|██▋  | 975M/1.82G [03:58<03:58, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  54%|██▋  | 986M/1.82G [04:01<03:41, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  55%|██▋  | 996M/1.82G [04:03<03:27, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  55%|██▏ | 1.01G/1.82G [04:06<03:18, 4.07MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  56%|██▏ | 1.02G/1.82G [04:08<03:10, 4.18MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  57%|██▎ | 1.03G/1.82G [04:10<03:05, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  57%|██▎ | 1.04G/1.82G [04:13<03:00, 4.31MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  58%|██▎ | 1.05G/1.82G [04:15<02:56, 4.35MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  58%|██▎ | 1.06G/1.82G [04:17<02:52, 4.38MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  59%|██▎ | 1.07G/1.82G [04:20<02:49, 4.40MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  59%|██▍ | 1.08G/1.82G [04:22<02:46, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  60%|██▍ | 1.09G/1.82G [04:24<02:43, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  61%|██▍ | 1.10G/1.82G [04:27<02:41, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  61%|██▍ | 1.11G/1.82G [04:32<03:37, 3.23MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  62%|██▍ | 1.12G/1.82G [04:34<03:17, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  62%|██▍ | 1.13G/1.82G [04:37<03:01, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  63%|██▌ | 1.14G/1.82G [04:39<02:50, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  64%|██▌ | 1.15G/1.82G [04:42<02:43, 4.06MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  64%|██▌ | 1.16G/1.82G [04:44<02:35, 4.18MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  65%|██▌ | 1.17G/1.82G [04:46<02:30, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  65%|██▌ | 1.18G/1.82G [04:49<02:26, 4.31MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  66%|██▋ | 1.20G/1.82G [04:51<02:25, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  66%|██▋ | 1.21G/1.82G [04:53<02:18, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  67%|██▋ | 1.22G/1.82G [04:56<02:15, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  68%|██▋ | 1.23G/1.82G [04:58<02:12, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  68%|██▋ | 1.24G/1.82G [05:01<02:12, 4.36MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  69%|██▋ | 1.25G/1.82G [05:06<02:55, 3.24MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  69%|██▊ | 1.26G/1.82G [05:08<02:38, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  70%|██▊ | 1.27G/1.82G [05:10<02:25, 3.76MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  70%|██▊ | 1.28G/1.82G [05:13<02:18, 3.87MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  71%|██▊ | 1.29G/1.82G [05:15<02:08, 4.10MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  72%|██▊ | 1.30G/1.82G [05:17<02:02, 4.21MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  72%|██▉ | 1.31G/1.82G [05:20<01:57, 4.28MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  73%|██▉ | 1.32G/1.82G [05:22<01:54, 4.33MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  73%|██▉ | 1.33G/1.82G [05:25<01:50, 4.36MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  74%|██▉ | 1.34G/1.82G [05:27<01:47, 4.39MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  75%|██▉ | 1.35G/1.82G [05:29<01:44, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  75%|███ | 1.36G/1.82G [05:32<01:42, 4.41MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  76%|███ | 1.37G/1.82G [05:34<01:39, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  76%|███ | 1.38G/1.82G [05:36<01:37, 4.44MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  77%|███ | 1.39G/1.82G [05:42<02:10, 3.22MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  77%|███ | 1.41G/1.82G [05:44<01:56, 3.52MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  78%|███ | 1.42G/1.82G [05:46<01:46, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  79%|███▏| 1.43G/1.82G [05:49<01:40, 3.89MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  79%|███▏| 1.44G/1.82G [05:51<01:32, 4.09MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  80%|███▏| 1.45G/1.82G [05:53<01:27, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  80%|███▏| 1.46G/1.82G [05:56<01:23, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  81%|███▏| 1.47G/1.82G [05:58<01:21, 4.25MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  81%|███▎| 1.48G/1.82G [06:01<01:16, 4.37MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  82%|███▎| 1.49G/1.82G [06:03<01:14, 4.40MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  83%|███▎| 1.50G/1.82G [06:05<01:11, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  83%|███▎| 1.51G/1.82G [06:08<01:08, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  84%|███▎| 1.52G/1.82G [06:10<01:06, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  84%|███▎| 1.53G/1.82G [06:12<01:04, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  85%|███▍| 1.54G/1.82G [06:15<01:01, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  85%|███▍| 1.55G/1.82G [06:17<00:59, 4.45MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  86%|███▍| 1.56G/1.82G [06:19<00:56, 4.46MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  87%|███▍| 1.57G/1.82G [06:22<00:54, 4.46MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  87%|███▍| 1.58G/1.82G [06:24<00:53, 4.36MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  88%|███▌| 1.59G/1.82G [06:26<00:49, 4.49MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  88%|███▌| 1.60G/1.82G [06:33<01:13, 2.87MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  89%|███▌| 1.61G/1.82G [06:35<01:02, 3.21MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  90%|███▌| 1.63G/1.82G [06:38<00:54, 3.51MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  90%|███▌| 1.64G/1.82G [06:40<00:48, 3.67MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  91%|███▋| 1.65G/1.82G [06:43<00:42, 3.95MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  91%|███▋| 1.66G/1.82G [06:45<00:38, 4.09MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  92%|███▋| 1.67G/1.82G [06:47<00:35, 4.19MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  92%|███▋| 1.68G/1.82G [06:50<00:32, 4.27MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  93%|███▋| 1.69G/1.82G [06:52<00:29, 4.32MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  94%|███▋| 1.70G/1.82G [06:54<00:26, 4.36MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  94%|███▊| 1.71G/1.82G [06:57<00:24, 4.38MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  95%|███▊| 1.72G/1.82G [06:59<00:21, 4.40MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  95%|███▊| 1.73G/1.82G [07:01<00:19, 4.42MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  96%|███▊| 1.74G/1.82G [07:04<00:16, 4.43MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  96%|███▊| 1.75G/1.82G [07:09<00:19, 3.23MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  97%|███▉| 1.76G/1.82G [07:11<00:15, 3.51MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  98%|███▉| 1.77G/1.82G [07:14<00:11, 3.75MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  98%|███▉| 1.78G/1.82G [07:16<00:08, 3.94MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  99%|███▉| 1.79G/1.82G [07:19<00:05, 4.08MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors:  99%|███▉| 1.80G/1.82G [07:21<00:02, 4.18MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors: 100%|███▉| 1.81G/1.82G [07:23<00:00, 4.26MB/s]\u001b[A\n",
      "model-00004-of-00007.safetensors: 100%|████| 1.82G/1.82G [07:23<00:00, 4.09MB/s]\u001b[A\n",
      "Downloading shards:  57%|█████████████▋          | 4/7 [23:20<18:53, 377.77s/it]\n",
      "model-00005-of-00007.safetensors:   0%|             | 0.00/1.97G [00:00<?, ?B/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   1%|    | 10.5M/1.97G [00:01<05:21, 6.09MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   1%|    | 21.0M/1.97G [00:03<06:00, 5.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   2%|    | 31.5M/1.97G [00:06<06:34, 4.91MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   2%|    | 41.9M/1.97G [00:08<06:47, 4.72MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   3%|    | 52.4M/1.97G [00:10<06:55, 4.62MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   3%|▏   | 62.9M/1.97G [00:13<06:58, 4.56MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   4%|▏   | 73.4M/1.97G [00:15<06:58, 4.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   4%|▏   | 83.9M/1.97G [00:21<09:59, 3.15MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   5%|▏   | 94.4M/1.97G [00:23<08:50, 3.53MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   5%|▎    | 105M/1.97G [00:25<08:13, 3.77MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   6%|▎    | 115M/1.97G [00:27<07:48, 3.96MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   6%|▎    | 126M/1.97G [00:30<07:31, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   7%|▎    | 136M/1.97G [00:32<07:16, 4.20MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   7%|▎    | 147M/1.97G [00:35<07:06, 4.27MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   8%|▍    | 157M/1.97G [00:37<06:59, 4.32MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   9%|▍    | 168M/1.97G [00:39<06:56, 4.32MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:   9%|▍    | 178M/1.97G [00:42<06:46, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  10%|▍    | 189M/1.97G [00:44<06:43, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  10%|▌    | 199M/1.97G [00:46<06:41, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  11%|▌    | 210M/1.97G [00:49<06:36, 4.44MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  11%|▌    | 220M/1.97G [00:51<06:33, 4.44MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  12%|▌    | 231M/1.97G [00:56<08:58, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  12%|▌    | 241M/1.97G [00:59<08:10, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  13%|▋    | 252M/1.97G [01:01<07:37, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  13%|▋    | 262M/1.97G [01:03<07:13, 3.93MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  14%|▋    | 273M/1.97G [01:06<06:55, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  14%|▋    | 283M/1.97G [01:08<06:51, 4.09MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  15%|▋    | 294M/1.97G [01:11<06:30, 4.29MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  15%|▊    | 304M/1.97G [01:13<06:24, 4.33MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  16%|▊    | 315M/1.97G [01:15<06:18, 4.37MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  17%|▊    | 325M/1.97G [01:18<06:13, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  17%|▊    | 336M/1.97G [01:20<06:10, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  18%|▉    | 346M/1.97G [01:22<06:07, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  18%|▉    | 357M/1.97G [01:25<06:03, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  19%|▉    | 367M/1.97G [01:27<06:01, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  19%|▉    | 377M/1.97G [01:32<08:12, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  20%|▉    | 388M/1.97G [01:35<07:28, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  20%|█    | 398M/1.97G [01:37<06:58, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  21%|█    | 409M/1.97G [01:39<06:36, 3.93MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  21%|█    | 419M/1.97G [01:42<06:19, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  22%|█    | 430M/1.97G [01:44<06:07, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  22%|█    | 440M/1.97G [01:46<05:58, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  23%|█▏   | 451M/1.97G [01:49<05:52, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  23%|█▏   | 461M/1.97G [01:51<05:46, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  24%|█▏   | 472M/1.97G [01:54<05:42, 4.37MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  25%|█▏   | 482M/1.97G [01:56<05:37, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  25%|█▎   | 493M/1.97G [01:58<05:34, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  26%|█▎   | 503M/1.97G [02:01<05:30, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  26%|█▎   | 514M/1.97G [02:06<07:31, 3.22MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  27%|█▎   | 524M/1.97G [02:08<06:51, 3.51MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  27%|█▎   | 535M/1.97G [02:11<06:23, 3.74MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  28%|█▍   | 545M/1.97G [02:13<06:01, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  28%|█▍   | 556M/1.97G [02:15<05:46, 4.07MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  29%|█▍   | 566M/1.97G [02:18<05:35, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  29%|█▍   | 577M/1.97G [02:20<05:27, 4.25MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  30%|█▍   | 587M/1.97G [02:22<05:20, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  30%|█▌   | 598M/1.97G [02:25<05:15, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  31%|█▌   | 608M/1.97G [02:27<05:15, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  31%|█▌   | 619M/1.97G [02:30<05:05, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  32%|█▌   | 629M/1.97G [02:32<05:02, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  32%|█▌   | 640M/1.97G [02:34<04:59, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  33%|█▋   | 650M/1.97G [02:37<04:56, 4.44MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  34%|█▋   | 661M/1.97G [02:42<06:44, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  34%|█▋   | 671M/1.97G [02:44<06:08, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  35%|█▋   | 682M/1.97G [02:47<05:42, 3.76MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  35%|█▊   | 692M/1.97G [02:49<05:23, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  36%|█▊   | 703M/1.97G [02:51<05:10, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  36%|█▊   | 713M/1.97G [02:54<04:59, 4.19MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  37%|█▊   | 724M/1.97G [02:56<04:52, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  37%|█▊   | 734M/1.97G [02:58<04:46, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  38%|█▉   | 744M/1.97G [03:01<04:41, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  38%|█▉   | 755M/1.97G [03:03<04:37, 4.38MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  39%|█▉   | 765M/1.97G [03:05<04:33, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  39%|█▉   | 776M/1.97G [03:08<04:30, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  40%|█▉   | 786M/1.97G [03:10<04:26, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  40%|██   | 797M/1.97G [03:13<04:24, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  41%|██   | 807M/1.97G [03:18<06:00, 3.22MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  42%|██   | 818M/1.97G [03:20<05:27, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  42%|██   | 828M/1.97G [03:23<05:03, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  43%|██▏  | 839M/1.97G [03:25<04:47, 3.93MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  43%|██▏  | 849M/1.97G [03:27<04:34, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  44%|██▏  | 860M/1.97G [03:30<04:24, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  44%|██▏  | 870M/1.97G [03:32<04:17, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  45%|██▏  | 881M/1.97G [03:34<04:12, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  45%|██▎  | 891M/1.97G [03:37<04:07, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  46%|██▎  | 902M/1.97G [03:39<04:03, 4.38MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  46%|██▎  | 912M/1.97G [03:41<04:00, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  47%|██▎  | 923M/1.97G [03:44<03:56, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  47%|██▎  | 933M/1.97G [03:46<03:54, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  48%|██▍  | 944M/1.97G [03:49<03:51, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  48%|██▍  | 954M/1.97G [03:54<05:18, 3.19MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  49%|██▍  | 965M/1.97G [03:56<04:44, 3.53MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  50%|██▍  | 975M/1.97G [03:59<04:23, 3.77MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  50%|██▌  | 986M/1.97G [04:01<04:08, 3.95MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  51%|██▌  | 996M/1.97G [04:03<03:57, 4.09MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  51%|██  | 1.01G/1.97G [04:06<03:50, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  52%|██  | 1.02G/1.97G [04:08<03:43, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  52%|██  | 1.03G/1.97G [04:10<03:37, 4.32MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  53%|██  | 1.04G/1.97G [04:13<03:33, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  53%|██▏ | 1.05G/1.97G [04:15<03:29, 4.39MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  54%|██▏ | 1.06G/1.97G [04:17<03:26, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  54%|██▏ | 1.07G/1.97G [04:20<03:23, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  55%|██▏ | 1.08G/1.97G [04:22<03:20, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  55%|██▏ | 1.09G/1.97G [04:27<04:32, 3.22MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  56%|██▏ | 1.10G/1.97G [04:30<04:06, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  56%|██▎ | 1.11G/1.97G [04:32<03:48, 3.76MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  57%|██▎ | 1.12G/1.97G [04:35<03:34, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  58%|██▎ | 1.13G/1.97G [04:37<03:25, 4.07MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  58%|██▎ | 1.14G/1.97G [04:39<03:19, 4.13MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  59%|██▎ | 1.15G/1.97G [04:42<03:10, 4.28MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  59%|██▎ | 1.16G/1.97G [04:44<03:06, 4.32MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  60%|██▍ | 1.17G/1.97G [04:46<03:02, 4.36MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  60%|██▍ | 1.18G/1.97G [04:49<02:58, 4.39MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  61%|██▍ | 1.20G/1.97G [04:51<02:55, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  61%|██▍ | 1.21G/1.97G [04:53<02:52, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  62%|██▍ | 1.22G/1.97G [04:56<02:49, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  62%|██▍ | 1.23G/1.97G [04:58<02:47, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  63%|██▌ | 1.24G/1.97G [05:03<03:46, 3.22MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  63%|██▌ | 1.25G/1.97G [05:06<03:24, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  64%|██▌ | 1.26G/1.97G [05:08<03:09, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  64%|██▌ | 1.27G/1.97G [05:10<02:57, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  65%|██▌ | 1.28G/1.97G [05:13<02:49, 4.07MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  66%|██▌ | 1.29G/1.97G [05:15<02:42, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  66%|██▋ | 1.30G/1.97G [05:18<02:36, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  67%|██▋ | 1.31G/1.97G [05:20<02:32, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  67%|██▋ | 1.32G/1.97G [05:22<02:28, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  68%|██▋ | 1.33G/1.97G [05:25<02:25, 4.39MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  68%|██▋ | 1.34G/1.97G [05:27<02:22, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  69%|██▋ | 1.35G/1.97G [05:29<02:19, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  69%|██▊ | 1.36G/1.97G [05:32<02:16, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  70%|██▊ | 1.37G/1.97G [05:34<02:14, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  70%|██▊ | 1.38G/1.97G [05:39<03:01, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  71%|██▊ | 1.39G/1.97G [05:42<02:43, 3.50MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  71%|██▊ | 1.41G/1.97G [05:44<02:30, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  72%|██▉ | 1.42G/1.97G [05:46<02:20, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  72%|██▉ | 1.43G/1.97G [05:49<02:12, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  73%|██▉ | 1.44G/1.97G [05:51<02:07, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  74%|██▉ | 1.45G/1.97G [05:54<02:02, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  74%|██▉ | 1.46G/1.97G [05:56<01:58, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  75%|██▉ | 1.47G/1.97G [05:58<01:54, 4.36MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  75%|███ | 1.48G/1.97G [06:01<01:51, 4.38MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  76%|███ | 1.49G/1.97G [06:03<01:48, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  76%|███ | 1.50G/1.97G [06:05<01:46, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  77%|███ | 1.51G/1.97G [06:08<01:43, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  77%|███ | 1.52G/1.97G [06:10<01:41, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  78%|███ | 1.53G/1.97G [06:15<02:15, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  78%|███▏| 1.54G/1.97G [06:18<02:01, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  79%|███▏| 1.55G/1.97G [06:20<01:50, 3.76MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  79%|███▏| 1.56G/1.97G [06:22<01:43, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  80%|███▏| 1.57G/1.97G [06:25<01:37, 4.07MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  80%|███▏| 1.58G/1.97G [06:27<01:31, 4.19MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  81%|███▏| 1.59G/1.97G [06:29<01:27, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  82%|███▎| 1.60G/1.97G [06:32<01:24, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  82%|███▎| 1.61G/1.97G [06:34<01:22, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  83%|███▎| 1.63G/1.97G [06:37<01:18, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  83%|███▎| 1.64G/1.97G [06:39<01:15, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  84%|███▎| 1.65G/1.97G [06:41<01:12, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  84%|███▎| 1.66G/1.97G [06:44<01:10, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  85%|███▍| 1.67G/1.97G [06:49<01:33, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  85%|███▍| 1.68G/1.97G [06:51<01:22, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  86%|███▍| 1.69G/1.97G [06:54<01:14, 3.75MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  86%|███▍| 1.70G/1.97G [06:56<01:08, 3.93MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  87%|███▍| 1.71G/1.97G [06:58<01:03, 4.06MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  87%|███▍| 1.72G/1.97G [07:01<00:59, 4.19MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  88%|███▌| 1.73G/1.97G [07:03<00:55, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  88%|███▌| 1.74G/1.97G [07:05<00:52, 4.32MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  89%|███▌| 1.75G/1.97G [07:08<00:49, 4.36MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  89%|███▌| 1.76G/1.97G [07:10<00:47, 4.39MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  90%|███▌| 1.77G/1.97G [07:13<00:44, 4.40MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  91%|███▌| 1.78G/1.97G [07:15<00:42, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  91%|███▋| 1.79G/1.97G [07:17<00:39, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  92%|███▋| 1.80G/1.97G [07:20<00:37, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  92%|███▋| 1.81G/1.97G [07:25<00:47, 3.23MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  93%|███▋| 1.82G/1.97G [07:27<00:40, 3.52MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  93%|███▋| 1.84G/1.97G [07:30<00:35, 3.74MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  94%|███▊| 1.85G/1.97G [07:32<00:31, 3.94MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  94%|███▊| 1.86G/1.97G [07:34<00:27, 4.08MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  95%|███▊| 1.87G/1.97G [07:37<00:24, 4.18MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  95%|███▊| 1.88G/1.97G [07:39<00:21, 4.26MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  96%|███▊| 1.89G/1.97G [07:41<00:18, 4.31MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  96%|███▊| 1.90G/1.97G [07:44<00:16, 4.35MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  97%|███▉| 1.91G/1.97G [07:46<00:13, 4.38MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  97%|███▉| 1.92G/1.97G [07:48<00:11, 4.39MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  98%|███▉| 1.93G/1.97G [07:51<00:08, 4.41MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  99%|███▉| 1.94G/1.97G [07:53<00:06, 4.42MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors:  99%|███▉| 1.95G/1.97G [07:56<00:04, 4.43MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors: 100%|███▉| 1.96G/1.97G [08:01<00:02, 3.22MB/s]\u001b[A\n",
      "model-00005-of-00007.safetensors: 100%|████| 1.97G/1.97G [08:04<00:00, 4.06MB/s]\u001b[A\n",
      "Downloading shards:  71%|█████████████████▏      | 5/7 [31:26<13:50, 415.31s/it]\n",
      "model-00006-of-00007.safetensors:   0%|             | 0.00/1.93G [00:00<?, ?B/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   1%|    | 10.5M/1.93G [00:01<05:59, 5.33MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   1%|    | 21.0M/1.93G [00:03<05:55, 5.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   2%|    | 31.5M/1.93G [00:06<06:27, 4.89MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   2%|    | 41.9M/1.93G [00:08<06:40, 4.71MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   3%|    | 52.4M/1.93G [00:11<06:52, 4.55MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   3%|▏   | 62.9M/1.93G [00:13<06:47, 4.57MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   4%|▏   | 73.4M/1.93G [00:15<06:48, 4.53MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   4%|▏   | 83.9M/1.93G [00:18<06:49, 4.51MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   5%|▏   | 94.4M/1.93G [00:20<06:48, 4.49MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   5%|▎    | 105M/1.93G [00:22<06:46, 4.48MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   6%|▎    | 115M/1.93G [00:28<09:20, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   7%|▎    | 126M/1.93G [00:30<08:30, 3.53MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   7%|▎    | 136M/1.93G [00:32<07:56, 3.76MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   8%|▍    | 147M/1.93G [00:35<07:32, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   8%|▍    | 157M/1.93G [00:37<07:13, 4.09MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   9%|▍    | 168M/1.93G [00:39<07:00, 4.19MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:   9%|▍    | 178M/1.93G [00:42<06:49, 4.27MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  10%|▍    | 189M/1.93G [00:44<06:41, 4.33MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  10%|▌    | 199M/1.93G [00:46<06:36, 4.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  11%|▌    | 210M/1.93G [00:49<06:31, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  11%|▌    | 220M/1.93G [00:51<06:28, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  12%|▌    | 231M/1.93G [00:53<06:24, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  13%|▋    | 241M/1.93G [00:56<06:20, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  13%|▋    | 252M/1.93G [00:58<06:18, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  14%|▋    | 262M/1.93G [01:04<08:35, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  14%|▋    | 273M/1.93G [01:06<07:50, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  15%|▋    | 283M/1.93G [01:08<07:17, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  15%|▊    | 294M/1.93G [01:11<06:54, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  16%|▊    | 304M/1.93G [01:13<06:38, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  16%|▊    | 315M/1.93G [01:15<06:24, 4.19MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  17%|▊    | 325M/1.93G [01:18<06:15, 4.27MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  17%|▊    | 336M/1.93G [01:20<06:08, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  18%|▉    | 346M/1.93G [01:22<06:08, 4.30MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  18%|▉    | 357M/1.93G [01:25<05:56, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  19%|▉    | 367M/1.93G [01:27<05:53, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  20%|▉    | 377M/1.93G [01:29<05:50, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  20%|█    | 388M/1.93G [01:32<05:52, 4.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  21%|█    | 398M/1.93G [01:34<05:42, 4.46MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  21%|█    | 409M/1.93G [01:39<07:48, 3.24MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  22%|█    | 419M/1.93G [01:42<07:07, 3.53MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  22%|█    | 430M/1.93G [01:44<06:37, 3.77MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  23%|█▏   | 440M/1.93G [01:46<06:16, 3.95MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  23%|█▏   | 451M/1.93G [01:49<06:01, 4.09MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  24%|█▏   | 461M/1.93G [01:51<05:49, 4.19MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  24%|█▏   | 472M/1.93G [01:54<05:41, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  25%|█▎   | 482M/1.93G [01:56<05:34, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  26%|█▎   | 493M/1.93G [01:58<05:30, 4.34MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  26%|█▎   | 503M/1.93G [02:01<05:24, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  27%|█▎   | 514M/1.93G [02:03<05:21, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  27%|█▎   | 524M/1.93G [02:05<05:17, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  28%|█▍   | 535M/1.93G [02:08<05:14, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  28%|█▍   | 545M/1.93G [02:10<05:11, 4.44MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  29%|█▍   | 556M/1.93G [02:15<07:05, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  29%|█▍   | 566M/1.93G [02:18<06:27, 3.51MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  30%|█▍   | 577M/1.93G [02:20<06:00, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  30%|█▌   | 587M/1.93G [02:22<05:41, 3.93MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  31%|█▌   | 598M/1.93G [02:25<05:26, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  32%|█▌   | 608M/1.93G [02:27<05:15, 4.18MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  32%|█▌   | 619M/1.93G [02:30<05:07, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  33%|█▋   | 629M/1.93G [02:32<05:01, 4.31MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  33%|█▋   | 640M/1.93G [02:34<04:55, 4.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  34%|█▋   | 650M/1.93G [02:37<04:51, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  34%|█▋   | 661M/1.93G [02:39<04:48, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  35%|█▋   | 671M/1.93G [02:41<04:44, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  35%|█▊   | 682M/1.93G [02:44<04:41, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  36%|█▊   | 692M/1.93G [02:49<06:23, 3.22MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  36%|█▊   | 703M/1.93G [02:51<05:48, 3.51MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  37%|█▊   | 713M/1.93G [02:54<05:24, 3.74MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  38%|█▉   | 724M/1.93G [02:56<05:05, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  38%|█▉   | 734M/1.93G [02:58<04:52, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  39%|█▉   | 744M/1.93G [03:01<04:42, 4.18MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  39%|█▉   | 755M/1.93G [03:03<04:35, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  40%|█▉   | 765M/1.93G [03:05<04:29, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  40%|██   | 776M/1.93G [03:08<04:24, 4.35MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  41%|██   | 786M/1.93G [03:10<04:21, 4.37MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  41%|██   | 797M/1.93G [03:13<04:16, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  42%|██   | 807M/1.93G [03:15<04:13, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  42%|██   | 818M/1.93G [03:17<04:10, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  43%|██▏  | 828M/1.93G [03:20<04:08, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  44%|██▏  | 839M/1.93G [03:25<05:37, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  44%|██▏  | 849M/1.93G [03:27<05:06, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  45%|██▏  | 860M/1.93G [03:30<04:44, 3.76MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  45%|██▎  | 870M/1.93G [03:32<04:28, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  46%|██▎  | 881M/1.93G [03:34<04:16, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  46%|██▎  | 891M/1.93G [03:37<04:07, 4.19MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  47%|██▎  | 902M/1.93G [03:39<04:00, 4.27MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  47%|██▎  | 912M/1.93G [03:41<03:55, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  48%|██▍  | 923M/1.93G [03:44<03:50, 4.35MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  48%|██▍  | 933M/1.93G [03:46<03:46, 4.38MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  49%|██▍  | 944M/1.93G [03:48<03:43, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  50%|██▍  | 954M/1.93G [03:51<03:40, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  50%|██▌  | 965M/1.93G [03:53<03:37, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  51%|██▌  | 975M/1.93G [03:56<03:34, 4.44MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  51%|██▌  | 986M/1.93G [04:01<04:51, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  52%|██▌  | 996M/1.93G [04:03<04:24, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  52%|██  | 1.01G/1.93G [04:06<04:05, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  53%|██  | 1.02G/1.93G [04:08<03:50, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  53%|██▏ | 1.03G/1.93G [04:10<03:40, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  54%|██▏ | 1.04G/1.93G [04:13<03:37, 4.10MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  54%|██▏ | 1.05G/1.93G [04:15<03:25, 4.29MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  55%|██▏ | 1.06G/1.93G [04:17<03:20, 4.33MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  55%|██▏ | 1.07G/1.93G [04:20<03:16, 4.37MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  56%|██▏ | 1.08G/1.93G [04:22<03:12, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  57%|██▎ | 1.09G/1.93G [04:24<03:10, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  57%|██▎ | 1.10G/1.93G [04:27<03:07, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  58%|██▎ | 1.11G/1.93G [04:29<03:04, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  58%|██▎ | 1.12G/1.93G [04:32<03:01, 4.44MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  59%|██▎ | 1.13G/1.93G [04:37<04:06, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  59%|██▎ | 1.14G/1.93G [04:39<03:43, 3.51MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  60%|██▍ | 1.15G/1.93G [04:42<03:26, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  60%|██▍ | 1.16G/1.93G [04:44<03:13, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  61%|██▍ | 1.17G/1.93G [04:46<03:04, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  61%|██▍ | 1.18G/1.93G [04:49<02:57, 4.18MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  62%|██▍ | 1.20G/1.93G [04:51<02:51, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  63%|██▌ | 1.21G/1.93G [04:53<02:47, 4.31MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  63%|██▌ | 1.22G/1.93G [04:56<02:43, 4.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  64%|██▌ | 1.23G/1.93G [04:58<02:40, 4.38MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  64%|██▌ | 1.24G/1.93G [05:00<02:36, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  65%|██▌ | 1.25G/1.93G [05:03<02:34, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  65%|██▌ | 1.26G/1.93G [05:05<02:31, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  66%|██▋ | 1.27G/1.93G [05:08<02:28, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  66%|██▋ | 1.28G/1.93G [05:13<03:20, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  67%|██▋ | 1.29G/1.93G [05:15<03:01, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  67%|██▋ | 1.30G/1.93G [05:18<02:47, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  68%|██▋ | 1.31G/1.93G [05:20<02:39, 3.87MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  69%|██▋ | 1.32G/1.93G [05:22<02:27, 4.10MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  69%|██▊ | 1.33G/1.93G [05:25<02:21, 4.20MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  70%|██▊ | 1.34G/1.93G [05:27<02:16, 4.27MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  70%|██▊ | 1.35G/1.93G [05:29<02:12, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  71%|██▊ | 1.36G/1.93G [05:32<02:09, 4.36MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  71%|██▊ | 1.37G/1.93G [05:34<02:06, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  72%|██▊ | 1.38G/1.93G [05:36<02:03, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  72%|██▉ | 1.39G/1.93G [05:39<02:00, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  73%|██▉ | 1.41G/1.93G [05:41<01:58, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  73%|██▉ | 1.42G/1.93G [05:46<02:38, 3.22MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  74%|██▉ | 1.43G/1.93G [05:49<02:22, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  75%|██▉ | 1.44G/1.93G [05:51<02:10, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  75%|███ | 1.45G/1.93G [05:53<02:02, 3.93MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  76%|███ | 1.46G/1.93G [05:56<01:55, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  76%|███ | 1.47G/1.93G [05:58<01:49, 4.18MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  77%|███ | 1.48G/1.93G [06:01<01:45, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  77%|███ | 1.49G/1.93G [06:03<01:41, 4.31MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  78%|███ | 1.50G/1.93G [06:05<01:38, 4.35MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  78%|███▏| 1.51G/1.93G [06:08<01:35, 4.38MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  79%|███▏| 1.52G/1.93G [06:10<01:32, 4.40MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  79%|███▏| 1.53G/1.93G [06:12<01:29, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  80%|███▏| 1.54G/1.93G [06:15<01:27, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  81%|███▏| 1.55G/1.93G [06:17<01:24, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  81%|███▏| 1.56G/1.93G [06:22<01:53, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  82%|███▎| 1.57G/1.93G [06:25<01:41, 3.51MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  82%|███▎| 1.58G/1.93G [06:27<01:31, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  83%|███▎| 1.59G/1.93G [06:29<01:24, 3.94MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  83%|███▎| 1.60G/1.93G [06:32<01:19, 4.07MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  84%|███▎| 1.61G/1.93G [06:34<01:14, 4.18MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  84%|███▎| 1.63G/1.93G [06:36<01:10, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  85%|███▍| 1.64G/1.93G [06:39<01:07, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  85%|███▍| 1.65G/1.93G [06:41<01:04, 4.33MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  86%|███▍| 1.66G/1.93G [06:44<01:01, 4.39MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  87%|███▍| 1.67G/1.93G [06:46<00:59, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  87%|███▍| 1.68G/1.93G [06:48<00:56, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  88%|███▌| 1.69G/1.93G [06:51<00:53, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  88%|███▌| 1.70G/1.93G [06:53<00:51, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  89%|███▌| 1.71G/1.93G [06:58<01:07, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  89%|███▌| 1.72G/1.93G [07:01<00:59, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  90%|███▌| 1.73G/1.93G [07:03<00:52, 3.75MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  90%|███▌| 1.74G/1.93G [07:05<00:47, 3.93MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  91%|███▋| 1.75G/1.93G [07:08<00:43, 4.03MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  91%|███▋| 1.76G/1.93G [07:10<00:39, 4.20MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  92%|███▋| 1.77G/1.93G [07:12<00:36, 4.27MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  92%|███▋| 1.78G/1.93G [07:15<00:33, 4.32MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  93%|███▋| 1.79G/1.93G [07:17<00:31, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  94%|███▋| 1.80G/1.93G [07:20<00:28, 4.41MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  94%|███▊| 1.81G/1.93G [07:22<00:25, 4.42MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  95%|███▊| 1.82G/1.93G [07:24<00:23, 4.43MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  95%|███▊| 1.84G/1.93G [07:27<00:20, 4.44MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  96%|███▊| 1.85G/1.93G [07:29<00:18, 4.44MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  96%|███▊| 1.86G/1.93G [07:34<00:22, 3.23MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  97%|███▊| 1.87G/1.93G [07:37<00:17, 3.52MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  97%|███▉| 1.88G/1.93G [07:39<00:13, 3.76MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  98%|███▉| 1.89G/1.93G [07:41<00:10, 3.95MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  98%|███▉| 1.90G/1.93G [07:44<00:07, 4.08MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors:  99%|███▉| 1.91G/1.93G [07:46<00:04, 4.19MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors: 100%|███▉| 1.92G/1.93G [07:48<00:01, 4.26MB/s]\u001b[A\n",
      "model-00006-of-00007.safetensors: 100%|████| 1.93G/1.93G [07:50<00:00, 4.09MB/s]\u001b[A\n",
      "Downloading shards:  86%|████████████████████▌   | 6/7 [39:20<07:14, 434.63s/it]\n",
      "model-00007-of-00007.safetensors:   0%|             | 0.00/1.05G [00:00<?, ?B/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   1%|    | 10.5M/1.05G [00:01<02:38, 6.59MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   2%|    | 21.0M/1.05G [00:03<03:13, 5.33MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   3%|    | 31.5M/1.05G [00:06<03:29, 4.88MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   4%|▏   | 41.9M/1.05G [00:08<03:35, 4.70MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   5%|▏   | 52.4M/1.05G [00:10<03:36, 4.61MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   6%|▏   | 62.9M/1.05G [00:13<03:37, 4.55MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   7%|▎   | 73.4M/1.05G [00:18<05:07, 3.19MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   8%|▎   | 83.9M/1.05G [00:21<04:41, 3.44MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:   9%|▎   | 94.4M/1.05G [00:23<04:13, 3.78MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  10%|▍    | 105M/1.05G [00:25<03:59, 3.96MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  11%|▌    | 115M/1.05G [00:28<03:48, 4.10MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  12%|▌    | 126M/1.05G [00:30<03:40, 4.20MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  13%|▋    | 136M/1.05G [00:32<03:34, 4.27MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  14%|▋    | 147M/1.05G [00:35<03:29, 4.32MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  15%|▋    | 157M/1.05G [00:37<03:25, 4.36MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  16%|▊    | 168M/1.05G [00:39<03:21, 4.38MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  17%|▊    | 178M/1.05G [00:42<03:18, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  18%|▉    | 189M/1.05G [00:44<03:15, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  19%|▉    | 199M/1.05G [00:46<03:12, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  20%|▉    | 210M/1.05G [00:52<04:21, 3.23MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  21%|█    | 220M/1.05G [00:54<03:57, 3.51MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  22%|█    | 231M/1.05G [00:56<03:39, 3.75MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  23%|█▏   | 241M/1.05G [00:59<03:26, 3.94MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  24%|█▏   | 252M/1.05G [01:01<03:16, 4.07MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  25%|█▏   | 262M/1.05G [01:03<03:08, 4.19MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  26%|█▎   | 273M/1.05G [01:06<03:03, 4.26MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  27%|█▎   | 283M/1.05G [01:08<02:58, 4.31MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  28%|█▍   | 294M/1.05G [01:11<02:54, 4.36MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  29%|█▍   | 304M/1.05G [01:13<02:50, 4.38MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  30%|█▍   | 315M/1.05G [01:15<02:47, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  31%|█▌   | 325M/1.05G [01:18<02:44, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  32%|█▌   | 336M/1.05G [01:20<02:41, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  33%|█▋   | 346M/1.05G [01:22<02:39, 4.44MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  34%|█▋   | 357M/1.05G [01:28<03:35, 3.23MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  35%|█▋   | 367M/1.05G [01:30<03:14, 3.52MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  36%|█▊   | 377M/1.05G [01:32<02:59, 3.75MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  37%|█▊   | 388M/1.05G [01:35<02:48, 3.94MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  38%|█▉   | 398M/1.05G [01:37<02:40, 4.08MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  39%|█▉   | 409M/1.05G [01:39<02:33, 4.19MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  40%|█▉   | 419M/1.05G [01:42<02:28, 4.27MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  41%|██   | 430M/1.05G [01:44<02:24, 4.32MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  42%|██   | 440M/1.05G [01:46<02:20, 4.35MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  43%|██▏  | 451M/1.05G [01:49<02:17, 4.38MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  44%|██▏  | 461M/1.05G [01:51<02:14, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  45%|██▏  | 472M/1.05G [01:54<02:11, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  46%|██▎  | 482M/1.05G [01:56<02:08, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  47%|██▎  | 493M/1.05G [01:58<02:06, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  48%|██▍  | 503M/1.05G [02:04<02:50, 3.23MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  49%|██▍  | 514M/1.05G [02:06<02:32, 3.52MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  50%|██▍  | 524M/1.05G [02:08<02:20, 3.75MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  51%|██▌  | 535M/1.05G [02:11<02:11, 3.94MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  52%|██▌  | 545M/1.05G [02:13<02:04, 4.08MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  53%|██▋  | 556M/1.05G [02:15<01:58, 4.18MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  54%|██▋  | 566M/1.05G [02:18<01:54, 4.26MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  55%|██▋  | 577M/1.05G [02:20<01:50, 4.31MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  56%|██▊  | 587M/1.05G [02:22<01:47, 4.35MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  57%|██▊  | 598M/1.05G [02:25<01:44, 4.37MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  58%|██▉  | 608M/1.05G [02:27<01:41, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  59%|██▉  | 619M/1.05G [02:30<01:40, 4.34MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  60%|██▉  | 629M/1.05G [02:32<01:35, 4.45MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  61%|███  | 640M/1.05G [02:34<01:32, 4.45MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  62%|███  | 650M/1.05G [02:39<02:04, 3.24MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  63%|███▏ | 661M/1.05G [02:42<01:51, 3.52MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  64%|███▏ | 671M/1.05G [02:44<01:41, 3.76MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  65%|███▏ | 682M/1.05G [02:47<01:34, 3.94MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  66%|███▎ | 692M/1.05G [02:49<01:28, 4.08MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  67%|███▎ | 703M/1.05G [02:51<01:23, 4.19MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  68%|███▍ | 713M/1.05G [02:54<01:19, 4.25MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  69%|███▍ | 724M/1.05G [02:56<01:16, 4.31MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  70%|███▍ | 734M/1.05G [02:58<01:13, 4.35MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  71%|███▌ | 744M/1.05G [03:01<01:10, 4.39MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  72%|███▌ | 755M/1.05G [03:03<01:07, 4.41MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  73%|███▋ | 765M/1.05G [03:05<01:05, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  74%|███▋ | 776M/1.05G [03:08<01:02, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  75%|███▋ | 786M/1.05G [03:13<01:22, 3.23MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  76%|███▊ | 797M/1.05G [03:15<01:12, 3.51MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  77%|███▊ | 807M/1.05G [03:18<01:05, 3.75MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  78%|███▉ | 818M/1.05G [03:20<00:59, 3.94MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  79%|███▉ | 828M/1.05G [03:23<00:55, 4.07MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  80%|███▉ | 839M/1.05G [03:25<00:51, 4.18MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  81%|████ | 849M/1.05G [03:27<00:47, 4.25MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  82%|████ | 860M/1.05G [03:30<00:44, 4.32MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  83%|████▏| 870M/1.05G [03:32<00:41, 4.36MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  84%|████▏| 881M/1.05G [03:34<00:39, 4.38MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  85%|████▏| 891M/1.05G [03:37<00:36, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  86%|████▎| 902M/1.05G [03:39<00:34, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  87%|████▎| 912M/1.05G [03:41<00:31, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  88%|████▍| 923M/1.05G [03:44<00:29, 4.43MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  89%|████▍| 933M/1.05G [03:49<00:37, 3.22MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  90%|████▍| 944M/1.05G [03:51<00:31, 3.52MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  91%|████▌| 954M/1.05G [03:54<00:26, 3.74MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  92%|████▌| 965M/1.05G [03:56<00:22, 3.93MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  93%|████▋| 975M/1.05G [03:58<00:19, 4.07MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  94%|████▋| 986M/1.05G [04:01<00:16, 4.18MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  95%|████▋| 996M/1.05G [04:03<00:13, 4.25MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  96%|███▊| 1.01G/1.05G [04:06<00:10, 4.30MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  97%|███▊| 1.02G/1.05G [04:08<00:08, 4.31MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  98%|███▉| 1.03G/1.05G [04:10<00:05, 4.40MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors:  99%|███▉| 1.04G/1.05G [04:13<00:03, 4.41MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors: 100%|███▉| 1.05G/1.05G [04:15<00:00, 4.42MB/s]\u001b[A\n",
      "model-00007-of-00007.safetensors: 100%|████| 1.05G/1.05G [04:16<00:00, 4.10MB/s]\u001b[A\n",
      "Downloading shards: 100%|████████████████████████| 7/7 [43:37<00:00, 373.94s/it]\n",
      "Loading checkpoint shards: 100%|██████████████████| 7/7 [00:02<00:00,  2.74it/s]\n",
      "trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.0312\n",
      "--> Model\n",
      "\n",
      "--> model has 1.949696M params\n",
      "\n",
      "Setting num_proc from 16 back to 1 for the train split to disable multiprocessing as it only contains one shard.\n",
      "Generating train split: 114599 examples [00:00, 768479.29 examples/s]\n",
      "Setting num_proc from 16 back to 1 for the validation split to disable multiprocessing as it only contains one shard.\n",
      "Generating validation split: 1070 examples [00:00, 301100.66 examples/s]\n",
      "Setting num_proc from 16 back to 1 for the test split to disable multiprocessing as it only contains one shard.\n",
      "Generating test split: 1070 examples [00:00, 341259.62 examples/s]\n",
      "Map (num_proc=16): 100%|██████| 114599/114599 [00:02<00:00, 43661.11 examples/s]\n",
      "train_dataset: Dataset({\n",
      "    features: ['input_ids', 'labels'],\n",
      "    num_rows: 114599\n",
      "})\n",
      "Map (num_proc=16): 100%|███████████| 1070/1070 [00:00<00:00, 3611.37 examples/s]\n",
      "val_dataset: Dataset({\n",
      "    features: ['input_ids', 'output_ids'],\n",
      "    num_rows: 1070\n",
      "})\n",
      "Map (num_proc=16): 100%|███████████| 1070/1070 [00:00<00:00, 3678.11 examples/s]\n",
      "test_dataset: Dataset({\n",
      "    features: ['input_ids', 'output_ids'],\n",
      "    num_rows: 1070\n",
      "})\n",
      "--> Sanity check\n",
      "           '[gMASK]': 64790 -> -100\n",
      "               'sop': 64792 -> -100\n",
      "          '<|user|>': 64795 -> -100\n",
      "                  '': 30910 -> -100\n",
      "                '\\n': 13 -> -100\n",
      "                  '': 30910 -> -100\n",
      "                '类型': 33467 -> -100\n",
      "                 '#': 31010 -> -100\n",
      "                 '裤': 56532 -> -100\n",
      "                 '*': 30998 -> -100\n",
      "                 '版': 55090 -> -100\n",
      "                 '型': 54888 -> -100\n",
      "                 '#': 31010 -> -100\n",
      "                '宽松': 40833 -> -100\n",
      "                 '*': 30998 -> -100\n",
      "                '风格': 32799 -> -100\n",
      "                 '#': 31010 -> -100\n",
      "                '性感': 40589 -> -100\n",
      "                 '*': 30998 -> -100\n",
      "                '图案': 37505 -> -100\n",
      "                 '#': 31010 -> -100\n",
      "                '线条': 37216 -> -100\n",
      "                 '*': 30998 -> -100\n",
      "                 '裤': 56532 -> -100\n",
      "                 '型': 54888 -> -100\n",
      "                 '#': 31010 -> -100\n",
      "                 '阔': 56529 -> -100\n",
      "                 '腿': 56158 -> -100\n",
      "                 '裤': 56532 -> -100\n",
      "     '<|assistant|>': 64796 -> -100\n",
      "                  '': 30910 -> 30910\n",
      "                '\\n': 13 -> 13\n",
      "                  '': 30910 -> 30910\n",
      "                '宽松': 40833 -> 40833\n",
      "                 '的': 54530 -> 54530\n",
      "                 '阔': 56529 -> 56529\n",
      "                 '腿': 56158 -> 56158\n",
      "                 '裤': 56532 -> 56532\n",
      "                 '这': 54551 -> 54551\n",
      "                '两年': 33808 -> 33808\n",
      "                '真的': 32041 -> 32041\n",
      "                 '吸': 55360 -> 55360\n",
      "                 '粉': 55486 -> 55486\n",
      "                '不少': 32138 -> 32138\n",
      "                 '，': 31123 -> 31123\n",
      "                '明星': 32943 -> 32943\n",
      "                '时尚': 33481 -> 33481\n",
      "                 '达': 54880 -> 54880\n",
      "                '人的': 31664 -> 31664\n",
      "                '心头': 46565 -> 46565\n",
      "                 '爱': 54799 -> 54799\n",
      "                 '。': 31155 -> 31155\n",
      "                '毕竟': 33051 -> 33051\n",
      "                 '好': 54591 -> 54591\n",
      "                 '穿': 55432 -> 55432\n",
      "                '时尚': 33481 -> 33481\n",
      "                 '，': 31123 -> 31123\n",
      "                 '谁': 55622 -> 55622\n",
      "                '都能': 32904 -> 32904\n",
      "                 '穿': 55432 -> 55432\n",
      "                 '出': 54557 -> 54557\n",
      "                 '腿': 56158 -> 56158\n",
      "                 '长': 54625 -> 54625\n",
      "                 '2': 30943 -> 30943\n",
      "                 '米': 55055 -> 55055\n",
      "               '的效果': 35590 -> 35590\n",
      "                '宽松': 40833 -> 40833\n",
      "                 '的': 54530 -> 54530\n",
      "                 '裤': 56532 -> 56532\n",
      "                 '腿': 56158 -> 56158\n",
      "                 '，': 31123 -> 31123\n",
      "               '当然是': 48466 -> 48466\n",
      "                 '遮': 57148 -> 57148\n",
      "                 '肉': 55343 -> 55343\n",
      "                 '小': 54603 -> 54603\n",
      "                '能手': 49355 -> 49355\n",
      "                 '啊': 55674 -> 55674\n",
      "                 '。': 31155 -> 31155\n",
      "                '上身': 51605 -> 51605\n",
      "                 '随': 55119 -> 55119\n",
      "                 '性': 54642 -> 54642\n",
      "                '自然': 31799 -> 31799\n",
      "                 '不': 54535 -> 54535\n",
      "                 '拘': 57036 -> 57036\n",
      "                 '束': 55625 -> 55625\n",
      "                 '，': 31123 -> 31123\n",
      "                '面料': 46839 -> 46839\n",
      "                 '亲': 55113 -> 55113\n",
      "                 '肤': 56089 -> 56089\n",
      "                '舒适': 33894 -> 33894\n",
      "                 '贴': 55778 -> 55778\n",
      "                '身体': 31902 -> 31902\n",
      "                 '验': 55017 -> 55017\n",
      "                 '感': 54706 -> 54706\n",
      "                 '棒': 56382 -> 56382\n",
      "                 '棒': 56382 -> 56382\n",
      "                 '哒': 59230 -> 59230\n",
      "                 '。': 31155 -> 31155\n",
      "                 '系': 54712 -> 54712\n",
      "                 '带': 54882 -> 54882\n",
      "                '部分': 31726 -> 31726\n",
      "                '增加': 31917 -> 31917\n",
      "                '设计': 31735 -> 31735\n",
      "                '看点': 45032 -> 45032\n",
      "                 '，': 31123 -> 31123\n",
      "                 '还': 54656 -> 54656\n",
      "                 '让': 54772 -> 54772\n",
      "                '单品': 46539 -> 46539\n",
      "               '的设计': 34481 -> 34481\n",
      "                 '感': 54706 -> 54706\n",
      "                '更强': 43084 -> 43084\n",
      "                 '。': 31155 -> 31155\n",
      "                '腿部': 46799 -> 46799\n",
      "                '线条': 37216 -> 37216\n",
      "                 '若': 55351 -> 55351\n",
      "                 '隐': 55733 -> 55733\n",
      "                 '若': 55351 -> 55351\n",
      "                 '现': 54600 -> 54600\n",
      "                 '的': 54530 -> 54530\n",
      "                 '，': 31123 -> 31123\n",
      "                '性感': 40589 -> 40589\n",
      "                 '撩': 58521 -> 58521\n",
      "                 '人': 54533 -> 54533\n",
      "                 '。': 31155 -> 31155\n",
      "                '颜色': 33692 -> 33692\n",
      "                 '敲': 57004 -> 57004\n",
      "                '温柔': 34678 -> 34678\n",
      "                 '的': 54530 -> 54530\n",
      "                 '，': 31123 -> 31123\n",
      "                 '与': 54619 -> 54619\n",
      "                '裤子': 44722 -> 44722\n",
      "                '本身': 32754 -> 32754\n",
      "                 '所': 54626 -> 54626\n",
      "                '呈现': 33169 -> 33169\n",
      "               '的风格': 48084 -> 48084\n",
      "                '有点': 33149 -> 33149\n",
      "                 '反': 54955 -> 54955\n",
      "                 '差': 55342 -> 55342\n",
      "                 '萌': 56842 -> 56842\n",
      "                 '。': 31155 -> 31155\n",
      "                  '': 2 -> 2\n",
      "max_steps is given, it will override any value given in num_train_epochs\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running training *****\n",
      "  Num examples = 114,599\n",
      "  Num Epochs = 1\n",
      "  Instantaneous batch size per device = 4\n",
      "  Total train batch size (w. parallel, distributed & accumulation) = 4\n",
      "  Gradient Accumulation steps = 1\n",
      "  Total optimization steps = 3,000\n",
      "  Number of trainable parameters = 1,949,696\n",
      "{'loss': 4.8937, 'learning_rate': 4.9833333333333336e-05, 'epoch': 0.0}         \n",
      "{'loss': 4.7359, 'learning_rate': 4.966666666666667e-05, 'epoch': 0.0}          \n",
      "{'loss': 4.6107, 'learning_rate': 4.9500000000000004e-05, 'epoch': 0.0}         \n",
      "{'loss': 4.1562, 'learning_rate': 4.933333333333334e-05, 'epoch': 0.0}          \n",
      "{'loss': 4.0373, 'learning_rate': 4.9166666666666665e-05, 'epoch': 0.0}         \n",
      "{'loss': 3.999, 'learning_rate': 4.9e-05, 'epoch': 0.0}                         \n",
      "{'loss': 3.9322, 'learning_rate': 4.883333333333334e-05, 'epoch': 0.0}          \n",
      "{'loss': 3.9074, 'learning_rate': 4.866666666666667e-05, 'epoch': 0.0}          \n",
      "{'loss': 3.7359, 'learning_rate': 4.85e-05, 'epoch': 0.0}                       \n",
      "{'loss': 3.708, 'learning_rate': 4.8333333333333334e-05, 'epoch': 0.0}          \n",
      "{'loss': 3.6365, 'learning_rate': 4.8166666666666674e-05, 'epoch': 0.0}         \n",
      "{'loss': 3.6275, 'learning_rate': 4.8e-05, 'epoch': 0.0}                        \n",
      "{'loss': 3.6957, 'learning_rate': 4.7833333333333335e-05, 'epoch': 0.0}         \n",
      "{'loss': 3.6846, 'learning_rate': 4.766666666666667e-05, 'epoch': 0.0}          \n",
      "{'loss': 3.6877, 'learning_rate': 4.75e-05, 'epoch': 0.01}                      \n",
      "{'loss': 3.6217, 'learning_rate': 4.7333333333333336e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.5719, 'learning_rate': 4.716666666666667e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.6273, 'learning_rate': 4.7e-05, 'epoch': 0.01}                       \n",
      "{'loss': 3.5574, 'learning_rate': 4.683333333333334e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.6172, 'learning_rate': 4.666666666666667e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.5656, 'learning_rate': 4.6500000000000005e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.5385, 'learning_rate': 4.633333333333333e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.516, 'learning_rate': 4.6166666666666666e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.4863, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.5465, 'learning_rate': 4.5833333333333334e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.5586, 'learning_rate': 4.566666666666667e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.5646, 'learning_rate': 4.55e-05, 'epoch': 0.01}                      \n",
      "{'loss': 3.6336, 'learning_rate': 4.5333333333333335e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.458, 'learning_rate': 4.516666666666667e-05, 'epoch': 0.01}          \n",
      "{'loss': 3.4937, 'learning_rate': 4.5e-05, 'epoch': 0.01}                       \n",
      "{'loss': 3.5414, 'learning_rate': 4.483333333333333e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.5676, 'learning_rate': 4.466666666666667e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.5379, 'learning_rate': 4.4500000000000004e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.448, 'learning_rate': 4.433333333333334e-05, 'epoch': 0.01}          \n",
      "{'loss': 3.6127, 'learning_rate': 4.4166666666666665e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.5637, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.5283, 'learning_rate': 4.383333333333334e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.6338, 'learning_rate': 4.3666666666666666e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.4441, 'learning_rate': 4.35e-05, 'epoch': 0.01}                      \n",
      "{'loss': 3.4432, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.01}        \n",
      "{'loss': 3.3937, 'learning_rate': 4.316666666666667e-05, 'epoch': 0.01}         \n",
      "{'loss': 3.4801, 'learning_rate': 4.3e-05, 'epoch': 0.01}                       \n",
      "{'loss': 3.3377, 'learning_rate': 4.2833333333333335e-05, 'epoch': 0.02}        \n",
      "{'loss': 3.5885, 'learning_rate': 4.266666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.5895, 'learning_rate': 4.25e-05, 'epoch': 0.02}                      \n",
      "{'loss': 3.4461, 'learning_rate': 4.233333333333334e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4824, 'learning_rate': 4.216666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4891, 'learning_rate': 4.2e-05, 'epoch': 0.02}                       \n",
      "{'loss': 3.3883, 'learning_rate': 4.183333333333334e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4705, 'learning_rate': 4.166666666666667e-05, 'epoch': 0.02}         \n",
      " 17%|██████▋                                 | 500/3000 [05:53<29:04,  1.43it/s]***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:05<00:05,  2.71s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [00:09<00:03,  3.19s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [00:12<00:00,  3.26s/it]\u001b[ABuilding prefix dict from the default dictionary ...\n",
      "Dumping model to file cache /tmp/jieba.cache\n",
      "Loading model cost 0.585 seconds.\n",
      "Prefix dict has been built successfully.\n",
      "\n",
      "{'eval_rouge-1': 30.567896, 'eval_rouge-2': 6.281970000000001, 'eval_rouge-l': 25.334400000000002, 'eval_bleu-4': 0.03251626427627101, 'eval_runtime': 19.8442, 'eval_samples_per_second': 2.52, 'eval_steps_per_second': 0.202, 'epoch': 0.02}\n",
      "\n",
      " 17%|██████▋                                 | 500/3000 [06:13<29:04,  1.43it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-500\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/peft/utils/other.py:689: UserWarning: Unable to fetch remote file due to the following error (ReadTimeoutError(\"HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)\"), '(Request ID: fc0c2f72-fc62-4d18-a6cb-653e2c1d251c)') - silently ignoring the lookup for the file config.json in THUDM/chatglm3-6b.\n",
      "  warnings.warn(\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/peft/utils/save_and_load.py:243: UserWarning: Could not find a config file in THUDM/chatglm3-6b - will assume that the vocabulary was not modified.\n",
      "  warnings.warn(\n",
      "{'loss': 3.4697, 'learning_rate': 4.15e-05, 'epoch': 0.02}                      \n",
      "{'loss': 3.5053, 'learning_rate': 4.133333333333333e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.6205, 'learning_rate': 4.116666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4545, 'learning_rate': 4.1e-05, 'epoch': 0.02}                       \n",
      "{'loss': 3.4691, 'learning_rate': 4.0833333333333334e-05, 'epoch': 0.02}        \n",
      "{'loss': 3.4041, 'learning_rate': 4.066666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.5127, 'learning_rate': 4.05e-05, 'epoch': 0.02}                      \n",
      "{'loss': 3.44, 'learning_rate': 4.0333333333333336e-05, 'epoch': 0.02}          \n",
      "{'loss': 3.4303, 'learning_rate': 4.016666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4225, 'learning_rate': 4e-05, 'epoch': 0.02}                         \n",
      "{'loss': 3.5615, 'learning_rate': 3.983333333333333e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.5162, 'learning_rate': 3.966666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.5756, 'learning_rate': 3.9500000000000005e-05, 'epoch': 0.02}        \n",
      "{'loss': 3.5494, 'learning_rate': 3.933333333333333e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.5914, 'learning_rate': 3.9166666666666665e-05, 'epoch': 0.02}        \n",
      "{'loss': 3.5264, 'learning_rate': 3.9000000000000006e-05, 'epoch': 0.02}        \n",
      "{'loss': 3.4316, 'learning_rate': 3.883333333333333e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.3953, 'learning_rate': 3.866666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4484, 'learning_rate': 3.85e-05, 'epoch': 0.02}                      \n",
      "{'loss': 3.434, 'learning_rate': 3.8333333333333334e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.3814, 'learning_rate': 3.816666666666667e-05, 'epoch': 0.02}         \n",
      "{'loss': 3.4803, 'learning_rate': 3.8e-05, 'epoch': 0.03}                       \n",
      "{'loss': 3.4471, 'learning_rate': 3.7833333333333336e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4221, 'learning_rate': 3.766666666666667e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.3902, 'learning_rate': 3.7500000000000003e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4076, 'learning_rate': 3.733333333333334e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.5701, 'learning_rate': 3.7166666666666664e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4395, 'learning_rate': 3.7e-05, 'epoch': 0.03}                       \n",
      "{'loss': 3.4721, 'learning_rate': 3.683333333333334e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.4992, 'learning_rate': 3.6666666666666666e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4973, 'learning_rate': 3.65e-05, 'epoch': 0.03}                      \n",
      "{'loss': 3.61, 'learning_rate': 3.633333333333333e-05, 'epoch': 0.03}           \n",
      "{'loss': 3.3869, 'learning_rate': 3.6166666666666674e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.5307, 'learning_rate': 3.6e-05, 'epoch': 0.03}                       \n",
      "{'loss': 3.4045, 'learning_rate': 3.5833333333333335e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4244, 'learning_rate': 3.566666666666667e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.3973, 'learning_rate': 3.55e-05, 'epoch': 0.03}                      \n",
      "{'loss': 3.5367, 'learning_rate': 3.5333333333333336e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4006, 'learning_rate': 3.516666666666667e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.3957, 'learning_rate': 3.5e-05, 'epoch': 0.03}                       \n",
      "{'loss': 3.5447, 'learning_rate': 3.483333333333334e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.4432, 'learning_rate': 3.466666666666667e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.4211, 'learning_rate': 3.45e-05, 'epoch': 0.03}                      \n",
      "{'loss': 3.3471, 'learning_rate': 3.433333333333333e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.4416, 'learning_rate': 3.4166666666666666e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4801, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.4182, 'learning_rate': 3.3833333333333334e-05, 'epoch': 0.03}        \n",
      "{'loss': 3.2783, 'learning_rate': 3.366666666666667e-05, 'epoch': 0.03}         \n",
      "{'loss': 3.3508, 'learning_rate': 3.35e-05, 'epoch': 0.03}                      \n",
      "{'loss': 3.4672, 'learning_rate': 3.3333333333333335e-05, 'epoch': 0.03}        \n",
      " 33%|█████████████                          | 1000/3000 [13:26<28:12,  1.18it/s]/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:27<00:27, 13.92s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [00:55<00:19, 19.82s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [01:20<00:00, 21.75s/it]\u001b[A\n",
      "{'eval_rouge-1': 30.515234, 'eval_rouge-2': 6.588687999999999, 'eval_rouge-l': 24.004728, 'eval_bleu-4': 0.03379437843849515, 'eval_runtime': 85.8912, 'eval_samples_per_second': 0.582, 'eval_steps_per_second': 0.047, 'epoch': 0.03}\n",
      "\n",
      " 33%|█████████████                          | 1000/3000 [14:52<28:12,  1.18it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-1000\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--THUDM--chatglm3-6b/snapshots/741f31cf376b07d7b9ed87814bba271bdd49cc16/config.json\n",
      "Model config ChatGLMConfig {\n",
      "  \"_name_or_path\": \"THUDM/chatglm3-6b\",\n",
      "  \"add_bias_linear\": false,\n",
      "  \"add_qkv_bias\": true,\n",
      "  \"apply_query_key_layer_scaling\": true,\n",
      "  \"apply_residual_connection_post_layernorm\": false,\n",
      "  \"architectures\": [\n",
      "    \"ChatGLMModel\"\n",
      "  ],\n",
      "  \"attention_dropout\": 0.0,\n",
      "  \"attention_softmax_in_fp32\": true,\n",
      "  \"auto_map\": {\n",
      "    \"AutoConfig\": \"THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig\",\n",
      "    \"AutoModel\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForCausalLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSeq2SeqLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSequenceClassification\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification\"\n",
      "  },\n",
      "  \"bias_dropout_fusion\": true,\n",
      "  \"classifier_dropout\": null,\n",
      "  \"eos_token_id\": 2,\n",
      "  \"ffn_hidden_size\": 13696,\n",
      "  \"fp32_residual_connection\": false,\n",
      "  \"hidden_dropout\": 0.0,\n",
      "  \"hidden_size\": 4096,\n",
      "  \"kv_channels\": 128,\n",
      "  \"layernorm_epsilon\": 1e-05,\n",
      "  \"model_type\": \"chatglm\",\n",
      "  \"multi_query_attention\": true,\n",
      "  \"multi_query_group_num\": 2,\n",
      "  \"num_attention_heads\": 32,\n",
      "  \"num_layers\": 28,\n",
      "  \"original_rope\": true,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"padded_vocab_size\": 65024,\n",
      "  \"post_layer_norm\": true,\n",
      "  \"pre_seq_len\": null,\n",
      "  \"prefix_projection\": false,\n",
      "  \"quantization_bit\": 0,\n",
      "  \"rmsnorm\": true,\n",
      "  \"seq_length\": 8192,\n",
      "  \"tie_word_embeddings\": false,\n",
      "  \"torch_dtype\": \"float16\",\n",
      "  \"transformers_version\": \"4.37.2\",\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 65024\n",
      "}\n",
      "\n",
      "{'loss': 3.3197, 'learning_rate': 3.316666666666667e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.3232, 'learning_rate': 3.3e-05, 'epoch': 0.04}                       \n",
      "{'loss': 3.3766, 'learning_rate': 3.283333333333333e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.532, 'learning_rate': 3.266666666666667e-05, 'epoch': 0.04}          \n",
      "{'loss': 3.4779, 'learning_rate': 3.2500000000000004e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4145, 'learning_rate': 3.233333333333333e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.4102, 'learning_rate': 3.2166666666666665e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4021, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.3572, 'learning_rate': 3.183333333333334e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.5006, 'learning_rate': 3.1666666666666666e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4516, 'learning_rate': 3.15e-05, 'epoch': 0.04}                      \n",
      "{'loss': 3.4539, 'learning_rate': 3.1333333333333334e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.3916, 'learning_rate': 3.116666666666667e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.4318, 'learning_rate': 3.1e-05, 'epoch': 0.04}                       \n",
      "{'loss': 3.4158, 'learning_rate': 3.0833333333333335e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.3254, 'learning_rate': 3.066666666666667e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.384, 'learning_rate': 3.05e-05, 'epoch': 0.04}                       \n",
      "{'loss': 3.3412, 'learning_rate': 3.0333333333333337e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.2908, 'learning_rate': 3.016666666666667e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.4818, 'learning_rate': 3e-05, 'epoch': 0.04}                         \n",
      "{'loss': 3.4594, 'learning_rate': 2.9833333333333335e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4318, 'learning_rate': 2.9666666666666672e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4027, 'learning_rate': 2.95e-05, 'epoch': 0.04}                      \n",
      "{'loss': 3.4119, 'learning_rate': 2.9333333333333336e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.4268, 'learning_rate': 2.916666666666667e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.4482, 'learning_rate': 2.9e-05, 'epoch': 0.04}                       \n",
      "{'loss': 3.4359, 'learning_rate': 2.8833333333333334e-05, 'epoch': 0.04}        \n",
      "{'loss': 3.502, 'learning_rate': 2.8666666666666668e-05, 'epoch': 0.04}         \n",
      "{'loss': 3.3041, 'learning_rate': 2.8499999999999998e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.3619, 'learning_rate': 2.8333333333333335e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4654, 'learning_rate': 2.816666666666667e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.3211, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4488, 'learning_rate': 2.7833333333333333e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.458, 'learning_rate': 2.7666666666666667e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.5879, 'learning_rate': 2.7500000000000004e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.175, 'learning_rate': 2.733333333333333e-05, 'epoch': 0.05}          \n",
      "{'loss': 3.6328, 'learning_rate': 2.716666666666667e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.3754, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.2717, 'learning_rate': 2.6833333333333333e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.3586, 'learning_rate': 2.6666666666666667e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4123, 'learning_rate': 2.6500000000000004e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.2799, 'learning_rate': 2.633333333333333e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.3674, 'learning_rate': 2.6166666666666668e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.257, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.3438, 'learning_rate': 2.5833333333333336e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4039, 'learning_rate': 2.5666666666666666e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4111, 'learning_rate': 2.5500000000000003e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.44, 'learning_rate': 2.5333333333333337e-05, 'epoch': 0.05}          \n",
      "{'loss': 3.4814, 'learning_rate': 2.5166666666666667e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.3543, 'learning_rate': 2.5e-05, 'epoch': 0.05}                       \n",
      " 50%|███████████████████▌                   | 1500/3000 [21:52<17:50,  1.40it/s]/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:24<00:24, 12.04s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [00:47<00:16, 16.65s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [01:10<00:00, 19.18s/it]\u001b[A\n",
      "{'eval_rouge-1': 31.202128000000002, 'eval_rouge-2': 6.8413319999999995, 'eval_rouge-l': 22.664382, 'eval_bleu-4': 0.029822912162036138, 'eval_runtime': 94.8527, 'eval_samples_per_second': 0.527, 'eval_steps_per_second': 0.042, 'epoch': 0.05}\n",
      "\n",
      " 50%|███████████████████▌                   | 1500/3000 [23:26<17:50,  1.40it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-1500\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--THUDM--chatglm3-6b/snapshots/741f31cf376b07d7b9ed87814bba271bdd49cc16/config.json\n",
      "Model config ChatGLMConfig {\n",
      "  \"_name_or_path\": \"THUDM/chatglm3-6b\",\n",
      "  \"add_bias_linear\": false,\n",
      "  \"add_qkv_bias\": true,\n",
      "  \"apply_query_key_layer_scaling\": true,\n",
      "  \"apply_residual_connection_post_layernorm\": false,\n",
      "  \"architectures\": [\n",
      "    \"ChatGLMModel\"\n",
      "  ],\n",
      "  \"attention_dropout\": 0.0,\n",
      "  \"attention_softmax_in_fp32\": true,\n",
      "  \"auto_map\": {\n",
      "    \"AutoConfig\": \"THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig\",\n",
      "    \"AutoModel\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForCausalLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSeq2SeqLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSequenceClassification\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification\"\n",
      "  },\n",
      "  \"bias_dropout_fusion\": true,\n",
      "  \"classifier_dropout\": null,\n",
      "  \"eos_token_id\": 2,\n",
      "  \"ffn_hidden_size\": 13696,\n",
      "  \"fp32_residual_connection\": false,\n",
      "  \"hidden_dropout\": 0.0,\n",
      "  \"hidden_size\": 4096,\n",
      "  \"kv_channels\": 128,\n",
      "  \"layernorm_epsilon\": 1e-05,\n",
      "  \"model_type\": \"chatglm\",\n",
      "  \"multi_query_attention\": true,\n",
      "  \"multi_query_group_num\": 2,\n",
      "  \"num_attention_heads\": 32,\n",
      "  \"num_layers\": 28,\n",
      "  \"original_rope\": true,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"padded_vocab_size\": 65024,\n",
      "  \"post_layer_norm\": true,\n",
      "  \"pre_seq_len\": null,\n",
      "  \"prefix_projection\": false,\n",
      "  \"quantization_bit\": 0,\n",
      "  \"rmsnorm\": true,\n",
      "  \"seq_length\": 8192,\n",
      "  \"tie_word_embeddings\": false,\n",
      "  \"torch_dtype\": \"float16\",\n",
      "  \"transformers_version\": \"4.37.2\",\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 65024\n",
      "}\n",
      "\n",
      "{'loss': 3.3191, 'learning_rate': 2.4833333333333335e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.3133, 'learning_rate': 2.466666666666667e-05, 'epoch': 0.05}         \n",
      "{'loss': 3.3215, 'learning_rate': 2.45e-05, 'epoch': 0.05}                      \n",
      "{'loss': 3.3697, 'learning_rate': 2.4333333333333336e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.3615, 'learning_rate': 2.4166666666666667e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.4252, 'learning_rate': 2.4e-05, 'epoch': 0.05}                       \n",
      "{'loss': 3.3916, 'learning_rate': 2.3833333333333334e-05, 'epoch': 0.05}        \n",
      "{'loss': 3.5131, 'learning_rate': 2.3666666666666668e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.4262, 'learning_rate': 2.35e-05, 'epoch': 0.06}                      \n",
      "{'loss': 3.4756, 'learning_rate': 2.3333333333333336e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3861, 'learning_rate': 2.3166666666666666e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3654, 'learning_rate': 2.3000000000000003e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.4092, 'learning_rate': 2.2833333333333334e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.2176, 'learning_rate': 2.2666666666666668e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3951, 'learning_rate': 2.25e-05, 'epoch': 0.06}                      \n",
      "{'loss': 3.4074, 'learning_rate': 2.2333333333333335e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.5664, 'learning_rate': 2.216666666666667e-05, 'epoch': 0.06}         \n",
      "{'loss': 3.3922, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3449, 'learning_rate': 2.1833333333333333e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3744, 'learning_rate': 2.1666666666666667e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3865, 'learning_rate': 2.15e-05, 'epoch': 0.06}                      \n",
      "{'loss': 3.5195, 'learning_rate': 2.1333333333333335e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.6143, 'learning_rate': 2.116666666666667e-05, 'epoch': 0.06}         \n",
      "{'loss': 3.365, 'learning_rate': 2.1e-05, 'epoch': 0.06}                        \n",
      "{'loss': 3.3252, 'learning_rate': 2.0833333333333336e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.4656, 'learning_rate': 2.0666666666666666e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3002, 'learning_rate': 2.05e-05, 'epoch': 0.06}                      \n",
      "{'loss': 3.4168, 'learning_rate': 2.0333333333333334e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.299, 'learning_rate': 2.0166666666666668e-05, 'epoch': 0.06}         \n",
      "{'loss': 3.5039, 'learning_rate': 2e-05, 'epoch': 0.06}                         \n",
      "{'loss': 3.2756, 'learning_rate': 1.9833333333333335e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.2943, 'learning_rate': 1.9666666666666666e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.5125, 'learning_rate': 1.9500000000000003e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3551, 'learning_rate': 1.9333333333333333e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.4291, 'learning_rate': 1.9166666666666667e-05, 'epoch': 0.06}        \n",
      "{'loss': 3.3586, 'learning_rate': 1.9e-05, 'epoch': 0.06}                       \n",
      "{'loss': 3.4344, 'learning_rate': 1.8833333333333335e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4277, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.07}         \n",
      "{'loss': 3.368, 'learning_rate': 1.85e-05, 'epoch': 0.07}                       \n",
      "{'loss': 3.3607, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4531, 'learning_rate': 1.8166666666666667e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4254, 'learning_rate': 1.8e-05, 'epoch': 0.07}                       \n",
      "{'loss': 3.3984, 'learning_rate': 1.7833333333333334e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.3254, 'learning_rate': 1.7666666666666668e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4238, 'learning_rate': 1.75e-05, 'epoch': 0.07}                      \n",
      "{'loss': 3.357, 'learning_rate': 1.7333333333333336e-05, 'epoch': 0.07}         \n",
      "{'loss': 3.557, 'learning_rate': 1.7166666666666666e-05, 'epoch': 0.07}         \n",
      "{'loss': 3.2918, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.2486, 'learning_rate': 1.6833333333333334e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.3762, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.07}        \n",
      " 67%|██████████████████████████             | 2000/3000 [29:22<12:10,  1.37it/s]/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:37<00:37, 18.71s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [01:15<00:26, 26.80s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [01:20<00:00, 18.75s/it]\u001b[A\n",
      "{'eval_rouge-1': 31.25312, 'eval_rouge-2': 6.79938, 'eval_rouge-l': 23.455142000000002, 'eval_bleu-4': 0.03268422941034572, 'eval_runtime': 123.991, 'eval_samples_per_second': 0.403, 'eval_steps_per_second': 0.032, 'epoch': 0.07}\n",
      "\n",
      " 67%|██████████████████████████             | 2000/3000 [31:26<12:10,  1.37it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-2000\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--THUDM--chatglm3-6b/snapshots/741f31cf376b07d7b9ed87814bba271bdd49cc16/config.json\n",
      "Model config ChatGLMConfig {\n",
      "  \"_name_or_path\": \"THUDM/chatglm3-6b\",\n",
      "  \"add_bias_linear\": false,\n",
      "  \"add_qkv_bias\": true,\n",
      "  \"apply_query_key_layer_scaling\": true,\n",
      "  \"apply_residual_connection_post_layernorm\": false,\n",
      "  \"architectures\": [\n",
      "    \"ChatGLMModel\"\n",
      "  ],\n",
      "  \"attention_dropout\": 0.0,\n",
      "  \"attention_softmax_in_fp32\": true,\n",
      "  \"auto_map\": {\n",
      "    \"AutoConfig\": \"THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig\",\n",
      "    \"AutoModel\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForCausalLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSeq2SeqLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSequenceClassification\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification\"\n",
      "  },\n",
      "  \"bias_dropout_fusion\": true,\n",
      "  \"classifier_dropout\": null,\n",
      "  \"eos_token_id\": 2,\n",
      "  \"ffn_hidden_size\": 13696,\n",
      "  \"fp32_residual_connection\": false,\n",
      "  \"hidden_dropout\": 0.0,\n",
      "  \"hidden_size\": 4096,\n",
      "  \"kv_channels\": 128,\n",
      "  \"layernorm_epsilon\": 1e-05,\n",
      "  \"model_type\": \"chatglm\",\n",
      "  \"multi_query_attention\": true,\n",
      "  \"multi_query_group_num\": 2,\n",
      "  \"num_attention_heads\": 32,\n",
      "  \"num_layers\": 28,\n",
      "  \"original_rope\": true,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"padded_vocab_size\": 65024,\n",
      "  \"post_layer_norm\": true,\n",
      "  \"pre_seq_len\": null,\n",
      "  \"prefix_projection\": false,\n",
      "  \"quantization_bit\": 0,\n",
      "  \"rmsnorm\": true,\n",
      "  \"seq_length\": 8192,\n",
      "  \"tie_word_embeddings\": false,\n",
      "  \"torch_dtype\": \"float16\",\n",
      "  \"transformers_version\": \"4.37.2\",\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 65024\n",
      "}\n",
      "\n",
      "{'loss': 3.4906, 'learning_rate': 1.65e-05, 'epoch': 0.07}                      \n",
      "{'loss': 3.2854, 'learning_rate': 1.6333333333333335e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.2068, 'learning_rate': 1.6166666666666665e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4605, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.3424, 'learning_rate': 1.5833333333333333e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.5105, 'learning_rate': 1.5666666666666667e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.2871, 'learning_rate': 1.55e-05, 'epoch': 0.07}                      \n",
      "{'loss': 3.2295, 'learning_rate': 1.5333333333333334e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.357, 'learning_rate': 1.5166666666666668e-05, 'epoch': 0.07}         \n",
      "{'loss': 3.5467, 'learning_rate': 1.5e-05, 'epoch': 0.07}                       \n",
      "{'loss': 3.4027, 'learning_rate': 1.4833333333333336e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.4813, 'learning_rate': 1.4666666666666668e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.3547, 'learning_rate': 1.45e-05, 'epoch': 0.07}                      \n",
      "{'loss': 3.2604, 'learning_rate': 1.4333333333333334e-05, 'epoch': 0.07}        \n",
      "{'loss': 3.5307, 'learning_rate': 1.4166666666666668e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4002, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.3408, 'learning_rate': 1.3833333333333334e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.3328, 'learning_rate': 1.3666666666666666e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.3678, 'learning_rate': 1.3500000000000001e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.2973, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.3055, 'learning_rate': 1.3166666666666665e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4057, 'learning_rate': 1.3000000000000001e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4596, 'learning_rate': 1.2833333333333333e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4051, 'learning_rate': 1.2666666666666668e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.2668, 'learning_rate': 1.25e-05, 'epoch': 0.08}                      \n",
      "{'loss': 3.248, 'learning_rate': 1.2333333333333334e-05, 'epoch': 0.08}         \n",
      "{'loss': 3.426, 'learning_rate': 1.2166666666666668e-05, 'epoch': 0.08}         \n",
      "{'loss': 3.3771, 'learning_rate': 1.2e-05, 'epoch': 0.08}                       \n",
      "{'loss': 3.3551, 'learning_rate': 1.1833333333333334e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.3932, 'learning_rate': 1.1666666666666668e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.448, 'learning_rate': 1.1500000000000002e-05, 'epoch': 0.08}         \n",
      "{'loss': 3.3762, 'learning_rate': 1.1333333333333334e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.2244, 'learning_rate': 1.1166666666666668e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.2635, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4504, 'learning_rate': 1.0833333333333334e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.2801, 'learning_rate': 1.0666666666666667e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4568, 'learning_rate': 1.05e-05, 'epoch': 0.08}                      \n",
      "{'loss': 3.4785, 'learning_rate': 1.0333333333333333e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4623, 'learning_rate': 1.0166666666666667e-05, 'epoch': 0.08}        \n",
      "{'loss': 3.4801, 'learning_rate': 1e-05, 'epoch': 0.08}                         \n",
      "{'loss': 3.4312, 'learning_rate': 9.833333333333333e-06, 'epoch': 0.08}         \n",
      "{'loss': 3.36, 'learning_rate': 9.666666666666667e-06, 'epoch': 0.08}           \n",
      "{'loss': 3.2896, 'learning_rate': 9.5e-06, 'epoch': 0.08}                       \n",
      "{'loss': 3.4322, 'learning_rate': 9.333333333333334e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3393, 'learning_rate': 9.166666666666666e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3729, 'learning_rate': 9e-06, 'epoch': 0.09}                         \n",
      "{'loss': 3.4156, 'learning_rate': 8.833333333333334e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3988, 'learning_rate': 8.666666666666668e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.435, 'learning_rate': 8.500000000000002e-06, 'epoch': 0.09}          \n",
      "{'loss': 3.3822, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.09}         \n",
      " 83%|████████████████████████████████▌      | 2500/3000 [37:22<05:50,  1.43it/s]/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:04<00:04,  2.46s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [00:10<00:03,  3.74s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [00:14<00:00,  3.75s/it]\u001b[A\n",
      "{'eval_rouge-1': 32.719822, 'eval_rouge-2': 7.4275839999999995, 'eval_rouge-l': 25.128382000000002, 'eval_bleu-4': 0.03471444661210609, 'eval_runtime': 42.5407, 'eval_samples_per_second': 1.175, 'eval_steps_per_second': 0.094, 'epoch': 0.09}\n",
      "\n",
      " 83%|████████████████████████████████▌      | 2500/3000 [38:04<05:50,  1.43it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-2500\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--THUDM--chatglm3-6b/snapshots/741f31cf376b07d7b9ed87814bba271bdd49cc16/config.json\n",
      "Model config ChatGLMConfig {\n",
      "  \"_name_or_path\": \"THUDM/chatglm3-6b\",\n",
      "  \"add_bias_linear\": false,\n",
      "  \"add_qkv_bias\": true,\n",
      "  \"apply_query_key_layer_scaling\": true,\n",
      "  \"apply_residual_connection_post_layernorm\": false,\n",
      "  \"architectures\": [\n",
      "    \"ChatGLMModel\"\n",
      "  ],\n",
      "  \"attention_dropout\": 0.0,\n",
      "  \"attention_softmax_in_fp32\": true,\n",
      "  \"auto_map\": {\n",
      "    \"AutoConfig\": \"THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig\",\n",
      "    \"AutoModel\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForCausalLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSeq2SeqLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSequenceClassification\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification\"\n",
      "  },\n",
      "  \"bias_dropout_fusion\": true,\n",
      "  \"classifier_dropout\": null,\n",
      "  \"eos_token_id\": 2,\n",
      "  \"ffn_hidden_size\": 13696,\n",
      "  \"fp32_residual_connection\": false,\n",
      "  \"hidden_dropout\": 0.0,\n",
      "  \"hidden_size\": 4096,\n",
      "  \"kv_channels\": 128,\n",
      "  \"layernorm_epsilon\": 1e-05,\n",
      "  \"model_type\": \"chatglm\",\n",
      "  \"multi_query_attention\": true,\n",
      "  \"multi_query_group_num\": 2,\n",
      "  \"num_attention_heads\": 32,\n",
      "  \"num_layers\": 28,\n",
      "  \"original_rope\": true,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"padded_vocab_size\": 65024,\n",
      "  \"post_layer_norm\": true,\n",
      "  \"pre_seq_len\": null,\n",
      "  \"prefix_projection\": false,\n",
      "  \"quantization_bit\": 0,\n",
      "  \"rmsnorm\": true,\n",
      "  \"seq_length\": 8192,\n",
      "  \"tie_word_embeddings\": false,\n",
      "  \"torch_dtype\": \"float16\",\n",
      "  \"transformers_version\": \"4.37.2\",\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 65024\n",
      "}\n",
      "\n",
      "{'loss': 3.3975, 'learning_rate': 8.166666666666668e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3395, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.5033, 'learning_rate': 7.833333333333333e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.2904, 'learning_rate': 7.666666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3461, 'learning_rate': 7.5e-06, 'epoch': 0.09}                       \n",
      "{'loss': 3.258, 'learning_rate': 7.333333333333334e-06, 'epoch': 0.09}          \n",
      "{'loss': 3.4178, 'learning_rate': 7.166666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.29, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.09}           \n",
      "{'loss': 3.3336, 'learning_rate': 6.833333333333333e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3449, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.2623, 'learning_rate': 6.5000000000000004e-06, 'epoch': 0.09}        \n",
      "{'loss': 3.4258, 'learning_rate': 6.333333333333334e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3205, 'learning_rate': 6.166666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.4736, 'learning_rate': 6e-06, 'epoch': 0.09}                         \n",
      "{'loss': 3.4928, 'learning_rate': 5.833333333333334e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3186, 'learning_rate': 5.666666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.4344, 'learning_rate': 5.500000000000001e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.1904, 'learning_rate': 5.333333333333334e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.4949, 'learning_rate': 5.166666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3363, 'learning_rate': 5e-06, 'epoch': 0.09}                         \n",
      "{'loss': 3.2508, 'learning_rate': 4.833333333333333e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.5281, 'learning_rate': 4.666666666666667e-06, 'epoch': 0.09}         \n",
      "{'loss': 3.3281, 'learning_rate': 4.5e-06, 'epoch': 0.1}                        \n",
      "{'loss': 3.2305, 'learning_rate': 4.333333333333334e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.3645, 'learning_rate': 4.166666666666667e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.2992, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.3084, 'learning_rate': 3.833333333333334e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.451, 'learning_rate': 3.666666666666667e-06, 'epoch': 0.1}           \n",
      "{'loss': 3.4105, 'learning_rate': 3.5000000000000004e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.2891, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.3318, 'learning_rate': 3.166666666666667e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.4225, 'learning_rate': 3e-06, 'epoch': 0.1}                          \n",
      "{'loss': 3.3889, 'learning_rate': 2.8333333333333335e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.4721, 'learning_rate': 2.666666666666667e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.4826, 'learning_rate': 2.5e-06, 'epoch': 0.1}                        \n",
      "{'loss': 3.3594, 'learning_rate': 2.3333333333333336e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.4656, 'learning_rate': 2.166666666666667e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.3619, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.358, 'learning_rate': 1.8333333333333335e-06, 'epoch': 0.1}          \n",
      "{'loss': 3.3797, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.1758, 'learning_rate': 1.5e-06, 'epoch': 0.1}                        \n",
      "{'loss': 3.3969, 'learning_rate': 1.3333333333333334e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.5234, 'learning_rate': 1.1666666666666668e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.4031, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.1}         \n",
      "{'loss': 3.4793, 'learning_rate': 8.333333333333333e-07, 'epoch': 0.1}          \n",
      "{'loss': 3.3672, 'learning_rate': 6.666666666666667e-07, 'epoch': 0.1}          \n",
      "{'loss': 3.4021, 'learning_rate': 5.000000000000001e-07, 'epoch': 0.1}          \n",
      "{'loss': 3.3193, 'learning_rate': 3.3333333333333335e-07, 'epoch': 0.1}         \n",
      "{'loss': 3.3189, 'learning_rate': 1.6666666666666668e-07, 'epoch': 0.1}         \n",
      "{'loss': 3.4273, 'learning_rate': 0.0, 'epoch': 0.1}                            \n",
      "100%|███████████████████████████████████████| 3000/3000 [44:02<00:00,  1.38it/s]/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Evaluation *****\n",
      "  Num examples = 50\n",
      "  Batch size = 16\n",
      "\n",
      "  0%|                                                     | 0/4 [00:00<?, ?it/s]\u001b[A\n",
      " 50%|██████████████████████▌                      | 2/4 [00:03<00:03,  1.97s/it]\u001b[A\n",
      " 75%|█████████████████████████████████▊           | 3/4 [00:09<00:03,  3.27s/it]\u001b[A\n",
      "100%|█████████████████████████████████████████████| 4/4 [00:11<00:00,  2.89s/it]\u001b[A\n",
      "{'eval_rouge-1': 32.85251, 'eval_rouge-2': 7.427896, 'eval_rouge-l': 25.328852, 'eval_bleu-4': 0.03374555449894471, 'eval_runtime': 41.1312, 'eval_samples_per_second': 1.216, 'eval_steps_per_second': 0.097, 'epoch': 0.1}\n",
      "\n",
      "100%|███████████████████████████████████████| 3000/3000 [44:43<00:00,  1.38it/s]\u001b[A\n",
      "                                                                                \u001b[ASaving model checkpoint to ./output/tmp-checkpoint-3000\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--THUDM--chatglm3-6b/snapshots/741f31cf376b07d7b9ed87814bba271bdd49cc16/config.json\n",
      "Model config ChatGLMConfig {\n",
      "  \"_name_or_path\": \"THUDM/chatglm3-6b\",\n",
      "  \"add_bias_linear\": false,\n",
      "  \"add_qkv_bias\": true,\n",
      "  \"apply_query_key_layer_scaling\": true,\n",
      "  \"apply_residual_connection_post_layernorm\": false,\n",
      "  \"architectures\": [\n",
      "    \"ChatGLMModel\"\n",
      "  ],\n",
      "  \"attention_dropout\": 0.0,\n",
      "  \"attention_softmax_in_fp32\": true,\n",
      "  \"auto_map\": {\n",
      "    \"AutoConfig\": \"THUDM/chatglm3-6b--configuration_chatglm.ChatGLMConfig\",\n",
      "    \"AutoModel\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForCausalLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSeq2SeqLM\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForConditionalGeneration\",\n",
      "    \"AutoModelForSequenceClassification\": \"THUDM/chatglm3-6b--modeling_chatglm.ChatGLMForSequenceClassification\"\n",
      "  },\n",
      "  \"bias_dropout_fusion\": true,\n",
      "  \"classifier_dropout\": null,\n",
      "  \"eos_token_id\": 2,\n",
      "  \"ffn_hidden_size\": 13696,\n",
      "  \"fp32_residual_connection\": false,\n",
      "  \"hidden_dropout\": 0.0,\n",
      "  \"hidden_size\": 4096,\n",
      "  \"kv_channels\": 128,\n",
      "  \"layernorm_epsilon\": 1e-05,\n",
      "  \"model_type\": \"chatglm\",\n",
      "  \"multi_query_attention\": true,\n",
      "  \"multi_query_group_num\": 2,\n",
      "  \"num_attention_heads\": 32,\n",
      "  \"num_layers\": 28,\n",
      "  \"original_rope\": true,\n",
      "  \"pad_token_id\": 0,\n",
      "  \"padded_vocab_size\": 65024,\n",
      "  \"post_layer_norm\": true,\n",
      "  \"pre_seq_len\": null,\n",
      "  \"prefix_projection\": false,\n",
      "  \"quantization_bit\": 0,\n",
      "  \"rmsnorm\": true,\n",
      "  \"seq_length\": 8192,\n",
      "  \"tie_word_embeddings\": false,\n",
      "  \"torch_dtype\": \"float16\",\n",
      "  \"transformers_version\": \"4.37.2\",\n",
      "  \"use_cache\": true,\n",
      "  \"vocab_size\": 65024\n",
      "}\n",
      "\n",
      "\n",
      "\n",
      "Training completed. Do not forget to share your model on huggingface.co/models =)\n",
      "\n",
      "\n",
      "{'train_runtime': 2684.7212, 'train_samples_per_second': 4.47, 'train_steps_per_second': 1.117, 'train_loss': 3.4455494791666665, 'epoch': 0.1}\n",
      "100%|███████████████████████████████████████| 3000/3000 [44:44<00:00,  1.12it/s]\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
      "  warnings.warn(_create_warning_msg(\n",
      "***** Running Prediction *****\n",
      "  Num examples = 1070\n",
      "  Batch size = 16\n",
      "100%|███████████████████████████████████████████| 67/67 [17:47<00:00, 15.93s/it]\n"
     ]
    }
   ],
   "source": [
    "!CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE=\"1\" NCCL_IB_DISABLE=\"1\" python finetune_hf.py  data/AdvertiseGen_fix  THUDM/chatglm3-6b  configs/lora.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9418f6c5c264601",
   "metadata": {
    "collapsed": false,
    "id": "d9418f6c5c264601",
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 3. 使用微调的数据集进行推理\n",
    "在完成微调任务之后，我们可以查看到 `output` 文件夹下多了很多个`checkpoint-*`的文件夹，这些文件夹代表了训练的轮数。\n",
    "我们选择最后一轮的微调权重，并使用inference进行导入。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "5060015c24e97ae",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-04-14T06:23:52.725227Z",
     "start_time": "2024-04-14T06:23:41.284552Z"
    },
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "5060015c24e97ae",
    "outputId": "d3f03d0d-46bf-4c74-9b00-dc0160da0e15"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "Loading checkpoint shards: 100%|██████████████████| 7/7 [00:11<00:00,  1.60s/it]\n",
      "/home/ubuntu/miniconda3/envs/chatglm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "Setting eos_token is not supported, use the default one.\n",
      "Setting pad_token is not supported, use the default one.\n",
      "Setting unk_token is not supported, use the default one.\n",
      "这款连衣裙采用压褶的工艺，不规则的木耳边设计，让这款连衣裙更具有层次感，而袖口以及领口采用网纱拼接，增添视觉的层次感，更加性感迷人。下摆采用不规则的压褶设计，搭配上拉链的套头，让这款连衣裙更加方便穿脱，而袖口处的设计，更是拉长了整体裙子的长度，更显身材。\n"
     ]
    }
   ],
   "source": [
    "!CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE=\"1\" NCCL_IB_DISABLE=\"1\" python inference_hf.py output/checkpoint-3000/ --prompt \"类型#裙*版型#显瘦*材质#网纱*风格#性感*裙型#百褶*裙下摆#压褶*裙长#连衣裙*裙衣门襟#拉链*裙衣门襟#套头*裙款式#拼接*裙款式#拉链*裙款式#木耳边*裙款式#抽褶*裙款式#不规则\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18cd83087f096094",
   "metadata": {
    "collapsed": false,
    "id": "18cd83087f096094",
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 4. 总结\n",
    "到此位置，我们就完成了使用单张 GPU Lora 来微调 ChatGLM3-6B 模型，使其能生产出更好的广告。\n",
    "在本章节中，你将会学会：\n",
    "+ 如何使用模型进行 Lora 微调\n",
    "+ 微调数据集的准备和对齐\n",
    "+ 使用微调的模型进行推理"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "V100",
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
